From: schwarze Date: Tue, 13 Oct 2015 23:30:42 +0000 (+0000) Subject: Reject the escape sequences \[uD800] to \[uDFFF] in the parser. X-Git-Url: http://artulab.com/gitweb/?a=commitdiff_plain;h=136612f731496137b54d6ed411cabaf71efd113a;p=openbsd Reject the escape sequences \[uD800] to \[uDFFF] in the parser. These surrogates are not valid Unicode codepoints, so treat them just like any other undefined character escapes: Warn about them and do not produce output. Issue noticed while talking to stsp@, semarie@, and bentley@. --- diff --git a/regress/usr.bin/mandoc/char/unicode/input.out_ascii b/regress/usr.bin/mandoc/char/unicode/input.out_ascii index a9946d1b528..7711574c61d 100644 --- a/regress/usr.bin/mandoc/char/unicode/input.out_ascii +++ b/regress/usr.bin/mandoc/char/unicode/input.out_ascii @@ -37,8 +37,8 @@ DDEESSCCRRIIPPTTIIOONN U+CFFF 0xecbfbf end of last normal middle byte U+D000 0xed8080 begin of strange middle byte U+D7FF 0xed9fbf highest public three-byte - U+D800 0xeda080 ??? lowest surrogate - U+DFFF 0xedbfbf ??? highest surrogate + U+D800 0xeda080 ??? lowest surrogate + U+DFFF 0xedbfbf ??? highest surrogate U+E000 0xee8080 lowest private use U+FFFF 0xefbfbf highest three-byte diff --git a/regress/usr.bin/mandoc/char/unicode/input.out_lint b/regress/usr.bin/mandoc/char/unicode/input.out_lint index 77b6161cbab..8ac05edcef0 100644 --- a/regress/usr.bin/mandoc/char/unicode/input.out_lint +++ b/regress/usr.bin/mandoc/char/unicode/input.out_lint @@ -24,9 +24,11 @@ mandoc: input.in:34:19: ERROR: skipping bad character: 0xbf mandoc: input.in:41:25: ERROR: skipping bad character: 0xed mandoc: input.in:41:26: ERROR: skipping bad character: 0xa0 mandoc: input.in:41:27: ERROR: skipping bad character: 0x80 +mandoc: input.in:41:17: WARNING: invalid escape sequence: \[uD800] mandoc: input.in:42:25: ERROR: skipping bad character: 0xed mandoc: input.in:42:26: ERROR: skipping bad character: 0xbf mandoc: input.in:42:27: ERROR: skipping bad character: 0xbf +mandoc: input.in:42:17: WARNING: invalid escape sequence: \[uDFFF] mandoc: input.in:50:19: ERROR: skipping bad character: 0xf0 mandoc: input.in:50:20: ERROR: skipping bad character: 0x80 mandoc: input.in:50:21: ERROR: skipping bad character: 0x80 diff --git a/regress/usr.bin/mandoc/char/unicode/input.out_utf8 b/regress/usr.bin/mandoc/char/unicode/input.out_utf8 index 44813b8d7ae..89aa6719533 100644 --- a/regress/usr.bin/mandoc/char/unicode/input.out_utf8 +++ b/regress/usr.bin/mandoc/char/unicode/input.out_utf8 @@ -37,8 +37,8 @@ DDEESSCCRRIIPPTTIIOONN U+CFFF 0xecbfbf ì¿¿ì¿¿ end of last normal middle byte U+D000 0xed8080 퀀퀀 begin of strange middle byte U+D7FF 0xed9fbf ퟿퟿ highest public three-byte - U+D800 0xeda080 í €??? lowest surrogate - U+DFFF 0xedbfbf í¿¿??? highest surrogate + U+D800 0xeda080 ??? lowest surrogate + U+DFFF 0xedbfbf ??? highest surrogate U+E000 0xee8080  lowest private use U+FFFF 0xefbfbf ï¿¿ï¿¿ highest three-byte diff --git a/usr.bin/mandoc/mandoc.c b/usr.bin/mandoc/mandoc.c index 2184c17c052..3e56edf3abd 100644 --- a/usr.bin/mandoc/mandoc.c +++ b/usr.bin/mandoc/mandoc.c @@ -1,4 +1,4 @@ -/* $OpenBSD: mandoc.c,v 1.63 2015/10/12 00:07:27 schwarze Exp $ */ +/* $OpenBSD: mandoc.c,v 1.64 2015/10/13 23:30:42 schwarze Exp $ */ /* * Copyright (c) 2008-2011, 2014 Kristaps Dzonsons * Copyright (c) 2011-2015 Ingo Schwarze @@ -331,6 +331,9 @@ mandoc_escape(const char **end, const char **start, int *sz) break; if (*sz == 6 && (*start)[1] == '0') break; + if (*sz == 5 && (*start)[1] == 'D' && + strchr("89ABCDEF", (*start)[2]) != NULL) + break; if ((int)strspn(*start + 1, "0123456789ABCDEFabcdef") + 1 == *sz) gly = ESCAPE_UNICODE;