Calling MB_CUR_MAX is much more expensive than incrementing a pointer
and than testing and printing a byte, so do it once up front rather
than inside the inner loop. This speeds up rev(1) by about a factor
of three for typical use cases.
Performance issue found by cheloha@, but my fix is a bit simpler
and more rigorous than Scott's original patch.
While here, also add the missing handling for write errors (making
them fatal, whereas read errors remain non-fatal and proceed to the
next input file) and also avoid testing each byte twice, making the
code more straightforward and more readable.
In part using ideas from millert@ and martijn@.
OK martijn@.