Add pre-DMA-write barrier after data is stored to memory
There's already such a barrier in usbd_transfer() code-path, but this
one is called when the frames are queued to the HC ring. The audio
samples are stored in memory by userland later, *after* the frames are
scheduled (but before they are sent on the wire) so a barrier is
needed there. Without this change, the data produced by userland may
stay in the CPU caches and is not "seen" by the HC's DMA engine, in
turn the device plays noise on certain arm64 machines (RPI4, for
instance).
Fix mostly from Luca Castagnini with few tweaks from me. OK patrick@