Significant overhaul of the floating point save/restore code.
At this point the mechanism should closely resemble the powerpc64
save/restore points with one difference. (reload avoidance)
The previous 'aggressive' fpu save code that was (mostly) implemented before
and is present on arm32 and arm64.
There is one piece from that other design that remains, if
pcb->pcb_fpcpu == ci && ci->ci_fpuproc == p
after sleep, this will automatically re-activate the FPU state without
needing to reload it.
To enable this, the pointer pair is not changed on FPU context save
to indicate that the CPU still holds the valid content as long as both
of those pointers are pointing to each other.
Note that if another core steals the FPU conxtex (when we get to SMP)
the pcb->pcb_fpcpu will be another cpu, and from that it will know
to reload the FPU context. Also optimistically enabling this only makes
sense on riscv64 because there is the notion of FPU on and clean. Other
implimentations would need to 'fault on' the FPU enable, but could avoid
the FPU context load if no other processor has run this FPU context and no
other process has use FPU on this core.
ok kettenis@ deraadt@ Prior to a couple of fixes.