Don't clear the cpu's bit in the old pmap's pm_cpus until we're off
the old one and set it in the new pmap's pm_cpus before loading
%cr3 with the new value. In particular, do neither if %cr3 isn't
changing.
This eliminates a window where, when switching between threads in
a single a process, the pmap wouldn't have this cpu's bit set even
though we didn't change %cr3. With more of uvm unlocked, it was
possible for another cpu to update the page tables but not see a
need to send an IPI to this cpu, leading to crashes when TLB entries
that should have been invalidated were used.
malloc_duel testing by abluhm@
ok abluhm@ kettenis@ mlarkin@