Remove data dependency barrier from atomic_load_* functions
This makes the atomic_load_* functions relaxed in terms of memory
ordering. Now it should be acceptable to use these functions in
assertions.
The need of the data dependency barrier is conditioned to usage.
The barrier is unnecessary for the control decisions that cond_wait()
and refcnt_finalize() make. READ_ONCE() and SMR_PTR_GET() use the
barrier so that loaded pointers would work as excepted in lock-free
contexts (some Alpha CPUs have a data cache design that can cause
unusual load-load reordering if not synchronized properly).
OK bluhm@