add the mbuf tag that prevents loops in vxlan_encap, not vxlan_output.
vxlan_output calls ether_output, which will do arp for ipv4 packets.
if arp hasn't resolved an address for a peer yet, it will queue the
packet and transmit it again after resolution completes. the way
it outputs is to call the interface output routine again, which is
vxlan_output.
if we tag the packet in vxlan_output before arp, and then arp calls
vxlan_output again, it looks like a loop and drops it. moving the
tagging to when we add all the encap headers in vxlan_encap avoids
this issue.