By definition, the range of valid Unicode code points is the union of
U+0000..U+D7FF and U+E000..U+10FFFF (see Unicode 8.0.0, chapter 3.9).
In UTF-16, the encoded values that would represent U+D800..U+DFFF are
used for surrogate pairs. UTF-8 has no concept of surrogate pairs;
attempting to treat them as regular code points violates the standard
and makes no sense besides.
ok stsp@
-/* $OpenBSD: citrus_utf8.c,v 1.13 2015/10/12 17:50:51 schwarze Exp $ */
+/* $OpenBSD: citrus_utf8.c,v 1.14 2015/10/13 02:17:46 bentley Exp $ */
/*-
* Copyright (c) 2002-2004 Tim J. Robbins
return (1);
}
- if (wc < 0 || wc > 0x10ffff) {
+ if (wc < 0 || (wc > 0xd7ff && wc < 0xe000) || wc > 0x10ffff) {
errno = EILSEQ;
return ((size_t)-1);
}