Run raw IP input in parallel.
Running raw IPv4 input with shared net lock in parallel is less
complex than UDP. Especially there is no socket splicing.
New ip_deliver() may run with shared or exclusive net lock. The
last parameter indicates the mode. If is is running with shared
netlock and encounters a protocol that needs exclusive lock, the
packet is queued. Old ip_ours() always queued the packet. Now it
calls ip_deliver() with shared net lock, and if that cannot handle
the packet completely, the packet is queued and later processed
with exclusive net lock.
In case of an IPv6 header chain, that switches from shared to
exclusive processing, the next protocol and mbuf offset are stored
in a mbuf tag.
OK mvs@