From: schwarze Date: Sun, 11 Aug 2024 18:24:43 +0000 (+0000) Subject: Even though US-ASCII (= ANSI X3.4-1986) only defines 128 characters, X-Git-Url: http://artulab.com/gitweb/?a=commitdiff_plain;h=76e9942174fbc100685fafc74b4502e83f77ee74;p=openbsd Even though US-ASCII (= ANSI X3.4-1986) only defines 128 characters, the POSIX standard explicitly requires in section 6.2 that "the POSIX locale shall contain 256 single-byte characters", see: https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap06.html#tag_06_02 So the current behaviour of treating non-ASCII bytes in an LC_CTYPE=POSIX input stream as if they were characters is not a POSIX violation, but actually required by the standard - and not just for awk(1), but for utility programs in general and even for library functions in general. Consequently, delete the wrong sentence i added to the STANDARDS section last year. Thanks to millert@ and jmc@ for making me realize my mistake. OK millert@ jmc@ --- diff --git a/usr.bin/awk/awk.1 b/usr.bin/awk/awk.1 index 33f21d86a98..9fa1b4b5da0 100644 --- a/usr.bin/awk/awk.1 +++ b/usr.bin/awk/awk.1 @@ -1,4 +1,4 @@ -.\" $OpenBSD: awk.1,v 1.69 2024/07/30 13:55:11 jmc Exp $ +.\" $OpenBSD: awk.1,v 1.70 2024/08/11 18:24:43 schwarze Exp $ .\" .\" Copyright (C) Lucent Technologies 1997 .\" All Rights Reserved @@ -22,7 +22,7 @@ .\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF .\" THIS SOFTWARE. .\" -.Dd $Mdocdate: July 30 2024 $ +.Dd $Mdocdate: August 11 2024 $ .Dt AWK 1 .Os .Sh NAME @@ -1041,11 +1041,6 @@ and .Fn srand has been changed to support non-deterministic random numbers. .Pp -In -.Ev LC_CTYPE Ns Li =POSIX -mode, treating non-ASCII input bytes as non-letter characters rather -than as input encoding errors intentionally violates the specification. -.Pp The flags .Op Fl \&dV , .Op Fl -csv ,