Replace BN_lshift1()/BN_rshift1() with calls to BN_lshift()/BN_rshift().
Currently, BN_lshift1() and BN_rshift1() are separate implementations
that are intended to be faster since the shift is known (and only one bit
crosses a word boundary). However, with the rewrite of BN_lshift() and
BN_rshift(), they are either slower or only minimally faster (depending
on architecture).
Avoid duplication and turn BN_lshift1()/BN_rshift1() into functions that
call inlined versions of BN_lshift()/BN_rshift(), making BN_lshift() and
BN_rshift() call the same inlined implementation. This results in a single
implementation and BN_lshift1()/BN_rshift1() that out perform the previous
versions (in part due to compiler optimisation).
Now that none of the original code exists, replace the license and
copyright for this file.
ok tb@