This file lists all bug fixes, changes, etc., made since the
second edition of the AWK book was published in September 2023.
+Apr 22, 2024:
+ fixed regex engine gototab reallocation issue that was
+ introduced during the Nov 24 rewrite. Thanks to Arnold Robbins.
+ Fixed a scan bug in split in the case the separator is a single
+ character. thanks to Oguz Ismail for spotting the issue.
+
+Mar 10, 2024:
+ fixed use-after-free bug in fnematch due to adjbuf invalidating
+ the pointers to buf. thanks to github user caffe3 for spotting
+ the issue and providing a fix, and to Miguel Pineiro Jr.
+ for the alternative fix.
+ MAX_UTF_BYTES in fnematch has been replaced with awk_mb_cur_max.
+ thanks to Miguel Pineiro Jr.
+
Jan 22, 2024:
Restore the ability to compile with g++. Thanks to
Arnold Robbins.
Dec 24, 2023:
- matchop dereference after free problem fix when the first
- argument is a function call. thanks to Oguz Ismail Uysal.
+ Matchop dereference after free problem fix when the first
+ argument is a function call. Thanks to Oguz Ismail Uysal.
Fix inconsistent handling of --csv and FS set in the
command line. Thanks to Wilbert van der Poel.
- casting changes to int for is* functions.
+ Casting changes to int for is* functions.
Nov 27, 2023:
- Fix exit status of system on MacOS. update to REGRESS.
+ Fix exit status of system on MacOS. Update to REGRESS.
Thanks to Arnold Robbins.
Fix inconsistent handling of -F and --csv, and loss of csv
mode when FS is set.
Nov 24, 2023:
Fix issue #199: gototab improvements to dynamically resize the
table, qsort and bsearch to improve the lookup speed as the
- table gets larger for multibyte input. thanks to Arnold Robbins.
+ table gets larger for multibyte input. Thanks to Arnold Robbins.
Nov 23, 2023:
Fix Issue #169, related to escape sequences in strings.
by Miguel Pineiro Jr.
Nov 20, 2023:
- rewrite of fnematch to fix a number of issues, including
+ Rewrite of fnematch to fix a number of issues, including
extraneous output, out-of-bounds access, number of bytes
to push back after a failed match etc.
- thanks to Miguel Pineiro Jr.
+ Thanks to Miguel Pineiro Jr.
Nov 15, 2023:
- Man page edit, regression test fixes. thanks to Arnold Robbins
- consolidation of sub and gsub into dosub, removing duplicate
- code. thanks to Miguel Pineiro Jr.
+ Man page edit, regression test fixes. Thanks to Arnold Robbins
+ Consolidation of sub and gsub into dosub, removing duplicate
+ code. Thanks to Miguel Pineiro Jr.
gcc replaced with cc everywhere.
Oct 30, 2023:
- multiple fixes and a minor code cleanup.
- disabled utf-8 for non-multibyte locales, such as C or POSIX.
- fixed a bad char * cast that causes incorrect results on big-endian
- systems. also fixed an out-of-bounds read for empty CCL.
- fixed a buffer overflow in substr with utf-8 strings.
- many thanks to Todd C Miller.
+ Multiple fixes and a minor code cleanup.
+ Disabled utf-8 for non-multibyte locales, such as C or POSIX.
+ Fixed a bad char * cast that causes incorrect results on big-endian
+ systems. Also fixed an out-of-bounds read for empty CCL.
+ Fixed a buffer overflow in substr with utf-8 strings.
+ Many thanks to Todd C Miller.
Sep 24, 2023:
fnematch and getrune have been overhauled to solve issues around
- unicode FS and RS. also fixed gsub null match issue with unicode.
- big thanks to Arnold Robbins.
+ unicode FS and RS. Also fixed gsub null match issue with unicode.
+ Big thanks to Arnold Robbins.
Sep 12, 2023:
Fixed a length error in u8_byte2char that set RSTART to
of a string of 3 emojis is 3, not 12 as it would be if bytes
were counted.
- Regular expressions are processes as UTF-8.
+ Regular expressions are processed as UTF-8.
Unicode literals can be written as \u followed by one
to eight hexadecimal digits. These may appear in strings and
regular expressions.
-
The option `--csv` turns on CSV processing of input:
fields are separated by commas, fields may be quoted with
double-quote (`"`) characters, quoted fields may contain embedded newlines.
+Double-quotes in fields have to be doubled and enclosed in quoted fields.
In CSV mode, `FS` is ignored.
If no explicit separator argument is provided,
If your system does not have `yacc` or `bison` (the GNU
equivalent), you need to install one of them first.
+The default in the `makefile` is `bison`; you will have
+to edit the `makefile` to use `yacc`.
NOTE: This version uses ISO/IEC C99, as you should also. We have
compiled this without any changes using `gcc -Wall` and/or local C
#### Last Updated
-Mon 30 Oct 2023 12:53:07 MDT
+Mon 05 Feb 2024 08:46:55 IST
-/* $OpenBSD: b.c,v 1.50 2024/01/25 16:40:51 millert Exp $ */
+/* $OpenBSD: b.c,v 1.51 2024/04/25 18:33:53 millert Exp $ */
/****************************************************************
Copyright (C) Lucent Technologies 1997
All Rights Reserved
size_t orig_size = f->gototab[state].allocated; // 2nd half of new mem is this size
memset(p + orig_size, 0, orig_size * sizeof(gtte)); // clean it out
- f->gototab[state].allocated = new_size; // update gotottab info
+ f->gototab[state].allocated = new_size; // update gototab info
f->gototab[state].entries = p;
}
-static int get_gototab(fa *f, int state, int ch) /* hide gototab inplementation */
+static int get_gototab(fa *f, int state, int ch) /* hide gototab implementation */
{
gtte key;
gtte *item;
return left->ch - right->ch;
}
-static int set_gototab(fa *f, int state, int ch, int val) /* hide gototab inplementation */
+static int set_gototab(fa *f, int state, int ch, int val) /* hide gototab implementation */
{
if (f->gototab[state].inuse == 0) {
f->gototab[state].entries[0].ch = ch;
if (tab->inuse + 1 >= tab->allocated)
resize_gototab(f, state);
- f->gototab[state].entries[f->gototab[state].inuse-1].ch = ch;
- f->gototab[state].entries[f->gototab[state].inuse-1].state = val;
+ f->gototab[state].entries[f->gototab[state].inuse].ch = ch;
+ f->gototab[state].entries[f->gototab[state].inuse].state = val;
f->gototab[state].inuse++;
return val;
} else {
gtt *tab = & f->gototab[state];
if (tab->inuse + 1 >= tab->allocated)
resize_gototab(f, state);
- ++tab->inuse;
f->gototab[state].entries[tab->inuse].ch = ch;
f->gototab[state].entries[tab->inuse].state = val;
+ ++tab->inuse;
qsort(f->gototab[state].entries,
f->gototab[state].inuse, sizeof(gtte), entry_cmp);
}
-#define MAX_UTF_BYTES 4 // UTF-8 is up to 4 bytes long
-
/*
* NAME
* fnematch
do {
/*
- * Call u8_rune with at least MAX_UTF_BYTES ahead in
+ * Call u8_rune with at least awk_mb_cur_max ahead in
* the buffer until EOF interferes.
*/
- if (k - j < MAX_UTF_BYTES) {
- if (k + MAX_UTF_BYTES > buf + bufsize) {
+ if (k - j < awk_mb_cur_max) {
+ if (k + awk_mb_cur_max > buf + bufsize) {
+ char *obuf = buf;
adjbuf(&buf, &bufsize,
- bufsize + MAX_UTF_BYTES,
+ bufsize + awk_mb_cur_max,
quantum, 0, "fnematch");
+
+ /* buf resized, maybe moved. update pointers */
+ *pbufsize = bufsize;
+ if (obuf != buf) {
+ i = buf + (i - obuf);
+ j = buf + (j - obuf);
+ k = buf + (k - obuf);
+ *pbuf = buf;
+ if (patlen)
+ patbeg = buf + (patbeg - obuf);
+ }
}
- for (n = MAX_UTF_BYTES ; n > 0; n--) {
+ for (n = awk_mb_cur_max ; n > 0; n--) {
*k++ = (c = getc(f)) != EOF ? c : 0;
if (c == EOF) {
if (ferror(f))
s = 2;
} while (1);
- /* adjbuf() may have relocated a resized buffer. Inform the world. */
- *pbuf = buf;
- *pbufsize = bufsize;
-
if (patlen) {
/*
* Under no circumstances is the last character fed to
-/* $OpenBSD: main.c,v 1.68 2024/01/25 16:40:51 millert Exp $ */
+/* $OpenBSD: main.c,v 1.69 2024/04/25 18:33:53 millert Exp $ */
/****************************************************************
Copyright (C) Lucent Technologies 1997
All Rights Reserved
THIS SOFTWARE.
****************************************************************/
-const char *version = "version 20240122";
+const char *version = "version 20240422";
#define DEBUG
#include <stdio.h>
-/* $OpenBSD: run.c,v 1.84 2024/01/25 16:40:51 millert Exp $ */
+/* $OpenBSD: run.c,v 1.85 2024/04/25 18:33:53 millert Exp $ */
/****************************************************************
Copyright (C) Lucent Technologies 1997
All Rights Reserved
for (;;) {
n++;
t = s;
- while (*s != sep && *s != '\n' && *s != '\0')
+ while (*s != sep && *s != '\0')
s++;
temp = *s;
setptr(s, '\0');
Cell *dosub(Node **a, int subop) /* sub and gsub */
{
fa *pfa;
- int tempstat;
+ int tempstat = 0;
char *repl;
Cell *x;