artulab.com Git - openbsd/commit

author	schwarze <schwarze@openbsd.org>
	Tue, 14 May 2024 21:12:44 +0000 (21:12 +0000)
committer	schwarze <schwarze@openbsd.org>
	Tue, 14 May 2024 21:12:44 +0000 (21:12 +0000)
commit	a83ec1761e33fc9534627a36ca0f26bbd427f3ab
tree	2294782adb2fd93bd880f21ac437be119f181be4	tree \| snapshot
parent	b44c3c0a4c7ab272ed37e0794ec3756a47467f66	commit \| diff

Garbage collect dead code intended to write five- and six-byte UTF-8
sequences since the Unicode standard has been explicitly prohibiting
the use of such sequences when encoding Unicode characters for more
than 20 years now.

While here, also weed out UTF-16 surrogates and codepoints in the
invalid range 110000 to 1FFFFF if any are encountered. I hoped to
write "no functional change", but to my shame it turns out there
are unrelated bugs with \[uXXXX] parsing in roff_escape.c, so this
new anti-surrogate check is actually reachable until those other
bugs get fixed, and even after fixing those other bugs, it will
remain useful as a defense in depth.