Rework interaction between sleep API and exit1() and start unlocking ps_threads
This diff adjusts how single_thread_set() accounts the threads by using
ps_threadcnt as initial value and counting all threads out that are already
parked. In single_thread_check call exit1() before decreasing ps_singlecount
this is now done in exit1().
exit1() and thread_fork() ensure that ps_threadcnt is updated with the
pr->ps_mtx held and in exit1() also account for exiting threads since
exit1() can sleep.
OK mpi@