artulab.com Git - openbsd/commit

author	schwarze <schwarze@openbsd.org>
	Fri, 19 Dec 2014 04:57:11 +0000 (04:57 +0000)
committer	schwarze <schwarze@openbsd.org>
	Fri, 19 Dec 2014 04:57:11 +0000 (04:57 +0000)
commit	52a7f4662432db837ecf4c838b4be59349e5f106
tree	24e8b6acc4769d14ce05a77e8bfead8f04f14efd	tree \| snapshot
parent	762cb5c93628166ca5d600c5ac208692566aceff	commit \| diff

Rewrite the low-level UTF-8 parser from scratch.
It accepted invalid byte sequences like 0xc080-c1bf, 0xe08080-e09fbf,
0xeda080-edbfbf, and 0xf0808080-f08fbfbf, produced valid roff Unicode
escape sequences from them, and the algorithm contained strong
defenses against any attempt to fix it.

This cures an assertion failure in the terminal formatter caused
by sneaking in ASCII 0x08 (backspace) by "encoding" it as an (invalid)
multibyte UTF-8 sequence, found by jsg@ with afl.

As a bonus, the new algorithm also reduces the code in the function
by about 20%.

regress/usr.bin/mandoc/char/unicode/Makefile		diff \| blob \| history
regress/usr.bin/mandoc/char/unicode/input.in	[new file with mode: 0644]	blob
regress/usr.bin/mandoc/char/unicode/input.out_ascii	[new file with mode: 0644]	blob
regress/usr.bin/mandoc/char/unicode/input.out_lint	[new file with mode: 0644]	blob
regress/usr.bin/mandoc/char/unicode/input.out_utf8	[new file with mode: 0644]	blob
usr.bin/mandoc/preconv.c		diff \| blob \| history