Demacro SHA-512.
Use static inline functions instead of macros to implement SHA-512. At
the same time, make two key changes - firstly, rather than trying to
outsmart the compiler and shuffle variables around, write the algorithm
the way it is documented and actually swap the variable contents. Secondly,
instead of interleaving the message schedule update and the round, do the
full message schedule update first, then process the round.
Overall, we get safer and more readable code. Additionally, the compiler
can generate smaller and faster code (with a gain of 5-10% across a range
of architectures).
ok beck@ tb@