Reimplement bn_mul_words(), bn_mul_add_words() and bn_mul_comba{4,8}().
Use bignum primitives rather than the current mess of macros, which also
allows us to remove the essentially duplicate versions of
bn_mul_words() and bn_mul_add_words() for BN_LLONG.
The "mul" macro gets replaced by bn_mulw_addw(), "mul_add" with
bn_mulw_addw_addw() and "mul_add_c" with bn_mulw_addtw() (where 'w'
indicates single word input and 'tw' indicates triple word input).
The variables in the comba functions have also been reordered, so that the
patterns are easier to understand - the compiler can take care of
optimising the inputs and outputs to avoid register moves.
ok tb@