Add per-CPU caches to the pmemrange allocator.
The caches are used primarily to reduce contention on uvm_lock_fpageq() during
concurrent page faults. For the moment only uvm_pagealloc() tries to get a
page from the current CPU's cache. So on some architectures the caches are
also used by the pmap layer.
Each cache is composed of two magazines, design is borrowed from jeff bonwick
vmem's paper and the implementation is similar to the one of pool_cache from
dlg@. However there is no depot layer and magazines are refilled directly by
the pmemrange allocator.
This version includes splvm()/splx() dances because the buffer cache flips
buffers in interrupt context. So we have to prevent recursive accesses to
per-CPU magazines.
Tested by naddy@, solene@, krw@, robert@, claudio@ and Laurence Tratt.
ok claudio@, kettenis@