Explicitly allocate stack memory for ICMP payload in IPv4 forward.
Old ip_forward() allocated a fake mbuf copy on the stack to send
an ICMP packet after ip_output() has failed. It seems easier to
just copy the data onto the stack that icmp_error() may use. Only
if the ICMP error packet is acutally sent, create the mbuf.
m_dup_pkthdr() uses atomic operation to link the incpb to mbuf.
pf_pkt_addr_changed() was immediately called afterwards to remove
the linkage again. Also m_tag_delete_chain() was overhead. New
code uses less CPU locking in the hot path.
OK deraadt@ claudio@