Improve spinning in mtx_enter().
Instead of calling mtx_enter_try() in each spinning loop, do it
only if the result of a lockless read indicates that the mutex has
been released. This avoids some expensive atomic compare-and-swap
operations. Up to 5% reduction of spinning time during kernel build
can been seen on a 8 core amd64 machine. On other machines there
was no visible effect.
Test on powerpc64 has revealed a bug in mtx_owner declaration. Not
the variable was volatile, but the object it points to. Move the
volatile declaration in struct mutex to avoid a hang when going to
multiuser.
from Mateusz Guzik; input kettenis@ jca@; OK mpi@