Use the deep idle state available on Apple M1/M2 cores in the idle loop and
for suspend. This state makes the CPU lose some of its register state so
we need to save these registers before putting the core to sleep and
restore them when we wake up. This deep idle state has a higher wakeup
latency than the normal WFI idle state. Use similar logic as acpucpu(4) to
decide which idle state to pick.
If some cores of a cluster are in this deep idle state, turbo states become
available to the cores that remain active. So stop skipping these states.
This improves single-core performance a little bit.
The main win is in power savings when running in a state with a high clock
frequency. My M2 Pro mini goes from 14W to 6.5W when idle at the maximum
clock frequency. But event at the lowest clock frequency there are small
but significant power savings.
ok deraadt@, tobhe@