From: millert Date: Thu, 25 Apr 2024 18:33:53 +0000 (+0000) Subject: Update awk to the Apr 22, 2024 version. X-Git-Url: http://artulab.com/gitweb/?a=commitdiff_plain;h=85e00e3040cd85d427c43140c50b9dd688cf81dd;p=openbsd Update awk to the Apr 22, 2024 version. * fixed regex engine gototab reallocation issue that was introduced during the Nov 24 rewrite. * fixed use-after-free bug in fnematch due to adjbuf invalidating the pointers to buf. --- diff --git a/usr.bin/awk/FIXES b/usr.bin/awk/FIXES index 3b059250d5c..c4eef3bd8ea 100644 --- a/usr.bin/awk/FIXES +++ b/usr.bin/awk/FIXES @@ -25,19 +25,33 @@ THIS SOFTWARE. This file lists all bug fixes, changes, etc., made since the second edition of the AWK book was published in September 2023. +Apr 22, 2024: + fixed regex engine gototab reallocation issue that was + introduced during the Nov 24 rewrite. Thanks to Arnold Robbins. + Fixed a scan bug in split in the case the separator is a single + character. thanks to Oguz Ismail for spotting the issue. + +Mar 10, 2024: + fixed use-after-free bug in fnematch due to adjbuf invalidating + the pointers to buf. thanks to github user caffe3 for spotting + the issue and providing a fix, and to Miguel Pineiro Jr. + for the alternative fix. + MAX_UTF_BYTES in fnematch has been replaced with awk_mb_cur_max. + thanks to Miguel Pineiro Jr. + Jan 22, 2024: Restore the ability to compile with g++. Thanks to Arnold Robbins. Dec 24, 2023: - matchop dereference after free problem fix when the first - argument is a function call. thanks to Oguz Ismail Uysal. + Matchop dereference after free problem fix when the first + argument is a function call. Thanks to Oguz Ismail Uysal. Fix inconsistent handling of --csv and FS set in the command line. Thanks to Wilbert van der Poel. - casting changes to int for is* functions. + Casting changes to int for is* functions. Nov 27, 2023: - Fix exit status of system on MacOS. update to REGRESS. + Fix exit status of system on MacOS. Update to REGRESS. Thanks to Arnold Robbins. Fix inconsistent handling of -F and --csv, and loss of csv mode when FS is set. @@ -45,7 +59,7 @@ Nov 27, 2023: Nov 24, 2023: Fix issue #199: gototab improvements to dynamically resize the table, qsort and bsearch to improve the lookup speed as the - table gets larger for multibyte input. thanks to Arnold Robbins. + table gets larger for multibyte input. Thanks to Arnold Robbins. Nov 23, 2023: Fix Issue #169, related to escape sequences in strings. @@ -54,29 +68,29 @@ Nov 23, 2023: by Miguel Pineiro Jr. Nov 20, 2023: - rewrite of fnematch to fix a number of issues, including + Rewrite of fnematch to fix a number of issues, including extraneous output, out-of-bounds access, number of bytes to push back after a failed match etc. - thanks to Miguel Pineiro Jr. + Thanks to Miguel Pineiro Jr. Nov 15, 2023: - Man page edit, regression test fixes. thanks to Arnold Robbins - consolidation of sub and gsub into dosub, removing duplicate - code. thanks to Miguel Pineiro Jr. + Man page edit, regression test fixes. Thanks to Arnold Robbins + Consolidation of sub and gsub into dosub, removing duplicate + code. Thanks to Miguel Pineiro Jr. gcc replaced with cc everywhere. Oct 30, 2023: - multiple fixes and a minor code cleanup. - disabled utf-8 for non-multibyte locales, such as C or POSIX. - fixed a bad char * cast that causes incorrect results on big-endian - systems. also fixed an out-of-bounds read for empty CCL. - fixed a buffer overflow in substr with utf-8 strings. - many thanks to Todd C Miller. + Multiple fixes and a minor code cleanup. + Disabled utf-8 for non-multibyte locales, such as C or POSIX. + Fixed a bad char * cast that causes incorrect results on big-endian + systems. Also fixed an out-of-bounds read for empty CCL. + Fixed a buffer overflow in substr with utf-8 strings. + Many thanks to Todd C Miller. Sep 24, 2023: fnematch and getrune have been overhauled to solve issues around - unicode FS and RS. also fixed gsub null match issue with unicode. - big thanks to Arnold Robbins. + unicode FS and RS. Also fixed gsub null match issue with unicode. + Big thanks to Arnold Robbins. Sep 12, 2023: Fixed a length error in u8_byte2char that set RSTART to @@ -101,9 +115,8 @@ Sep 12, 2023: of a string of 3 emojis is 3, not 12 as it would be if bytes were counted. - Regular expressions are processes as UTF-8. + Regular expressions are processed as UTF-8. Unicode literals can be written as \u followed by one to eight hexadecimal digits. These may appear in strings and regular expressions. - diff --git a/usr.bin/awk/README.md b/usr.bin/awk/README.md index e83be10d55a..ea232e87689 100644 --- a/usr.bin/awk/README.md +++ b/usr.bin/awk/README.md @@ -38,6 +38,7 @@ Regular expressions may include UTF-8 code points, including `\u`. The option `--csv` turns on CSV processing of input: fields are separated by commas, fields may be quoted with double-quote (`"`) characters, quoted fields may contain embedded newlines. +Double-quotes in fields have to be doubled and enclosed in quoted fields. In CSV mode, `FS` is ignored. If no explicit separator argument is provided, @@ -112,6 +113,8 @@ move this to some place like `/usr/bin/awk`. If your system does not have `yacc` or `bison` (the GNU equivalent), you need to install one of them first. +The default in the `makefile` is `bison`; you will have +to edit the `makefile` to use `yacc`. NOTE: This version uses ISO/IEC C99, as you should also. We have compiled this without any changes using `gcc -Wall` and/or local C @@ -131,4 +134,4 @@ We don't usually do releases. #### Last Updated -Mon 30 Oct 2023 12:53:07 MDT +Mon 05 Feb 2024 08:46:55 IST diff --git a/usr.bin/awk/b.c b/usr.bin/awk/b.c index bc3f06fd320..89f4918b7a0 100644 --- a/usr.bin/awk/b.c +++ b/usr.bin/awk/b.c @@ -1,4 +1,4 @@ -/* $OpenBSD: b.c,v 1.50 2024/01/25 16:40:51 millert Exp $ */ +/* $OpenBSD: b.c,v 1.51 2024/04/25 18:33:53 millert Exp $ */ /**************************************************************** Copyright (C) Lucent Technologies 1997 All Rights Reserved @@ -613,11 +613,11 @@ static void resize_gototab(fa *f, int state) size_t orig_size = f->gototab[state].allocated; // 2nd half of new mem is this size memset(p + orig_size, 0, orig_size * sizeof(gtte)); // clean it out - f->gototab[state].allocated = new_size; // update gotottab info + f->gototab[state].allocated = new_size; // update gototab info f->gototab[state].entries = p; } -static int get_gototab(fa *f, int state, int ch) /* hide gototab inplementation */ +static int get_gototab(fa *f, int state, int ch) /* hide gototab implementation */ { gtte key; gtte *item; @@ -644,7 +644,7 @@ static int entry_cmp(const void *l, const void *r) return left->ch - right->ch; } -static int set_gototab(fa *f, int state, int ch, int val) /* hide gototab inplementation */ +static int set_gototab(fa *f, int state, int ch, int val) /* hide gototab implementation */ { if (f->gototab[state].inuse == 0) { f->gototab[state].entries[0].ch = ch; @@ -657,8 +657,8 @@ static int set_gototab(fa *f, int state, int ch, int val) /* hide gototab inplem if (tab->inuse + 1 >= tab->allocated) resize_gototab(f, state); - f->gototab[state].entries[f->gototab[state].inuse-1].ch = ch; - f->gototab[state].entries[f->gototab[state].inuse-1].state = val; + f->gototab[state].entries[f->gototab[state].inuse].ch = ch; + f->gototab[state].entries[f->gototab[state].inuse].state = val; f->gototab[state].inuse++; return val; } else { @@ -683,9 +683,9 @@ static int set_gototab(fa *f, int state, int ch, int val) /* hide gototab inplem gtt *tab = & f->gototab[state]; if (tab->inuse + 1 >= tab->allocated) resize_gototab(f, state); - ++tab->inuse; f->gototab[state].entries[tab->inuse].ch = ch; f->gototab[state].entries[tab->inuse].state = val; + ++tab->inuse; qsort(f->gototab[state].entries, f->gototab[state].inuse, sizeof(gtte), entry_cmp); @@ -836,8 +836,6 @@ int nematch(fa *f, const char *p0) /* non-empty match, for sub */ } -#define MAX_UTF_BYTES 4 // UTF-8 is up to 4 bytes long - /* * NAME * fnematch @@ -874,16 +872,28 @@ bool fnematch(fa *pfa, FILE *f, char **pbuf, int *pbufsize, int quantum) do { /* - * Call u8_rune with at least MAX_UTF_BYTES ahead in + * Call u8_rune with at least awk_mb_cur_max ahead in * the buffer until EOF interferes. */ - if (k - j < MAX_UTF_BYTES) { - if (k + MAX_UTF_BYTES > buf + bufsize) { + if (k - j < awk_mb_cur_max) { + if (k + awk_mb_cur_max > buf + bufsize) { + char *obuf = buf; adjbuf(&buf, &bufsize, - bufsize + MAX_UTF_BYTES, + bufsize + awk_mb_cur_max, quantum, 0, "fnematch"); + + /* buf resized, maybe moved. update pointers */ + *pbufsize = bufsize; + if (obuf != buf) { + i = buf + (i - obuf); + j = buf + (j - obuf); + k = buf + (k - obuf); + *pbuf = buf; + if (patlen) + patbeg = buf + (patbeg - obuf); + } } - for (n = MAX_UTF_BYTES ; n > 0; n--) { + for (n = awk_mb_cur_max ; n > 0; n--) { *k++ = (c = getc(f)) != EOF ? c : 0; if (c == EOF) { if (ferror(f)) @@ -920,10 +930,6 @@ bool fnematch(fa *pfa, FILE *f, char **pbuf, int *pbufsize, int quantum) s = 2; } while (1); - /* adjbuf() may have relocated a resized buffer. Inform the world. */ - *pbuf = buf; - *pbufsize = bufsize; - if (patlen) { /* * Under no circumstances is the last character fed to diff --git a/usr.bin/awk/main.c b/usr.bin/awk/main.c index 481e98b078a..11296ce12bf 100644 --- a/usr.bin/awk/main.c +++ b/usr.bin/awk/main.c @@ -1,4 +1,4 @@ -/* $OpenBSD: main.c,v 1.68 2024/01/25 16:40:51 millert Exp $ */ +/* $OpenBSD: main.c,v 1.69 2024/04/25 18:33:53 millert Exp $ */ /**************************************************************** Copyright (C) Lucent Technologies 1997 All Rights Reserved @@ -23,7 +23,7 @@ ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. ****************************************************************/ -const char *version = "version 20240122"; +const char *version = "version 20240422"; #define DEBUG #include diff --git a/usr.bin/awk/run.c b/usr.bin/awk/run.c index 73ab148952b..bf24e29bc73 100644 --- a/usr.bin/awk/run.c +++ b/usr.bin/awk/run.c @@ -1,4 +1,4 @@ -/* $OpenBSD: run.c,v 1.84 2024/01/25 16:40:51 millert Exp $ */ +/* $OpenBSD: run.c,v 1.85 2024/04/25 18:33:53 millert Exp $ */ /**************************************************************** Copyright (C) Lucent Technologies 1997 All Rights Reserved @@ -1831,7 +1831,7 @@ Cell *split(Node **a, int nnn) /* split(a[0], a[1], a[2]); a[3] is type */ for (;;) { n++; t = s; - while (*s != sep && *s != '\n' && *s != '\0') + while (*s != sep && *s != '\0') s++; temp = *s; setptr(s, '\0'); @@ -2527,7 +2527,7 @@ void backsub(char **pb_ptr, const char **sptr_ptr); Cell *dosub(Node **a, int subop) /* sub and gsub */ { fa *pfa; - int tempstat; + int tempstat = 0; char *repl; Cell *x;