Forming vxlan tunnel using static entry
- Assuming L1 and L4 have configured with static arp entry ie.. .L1 and L4 have learnt respective MAC information.
- When 184.108.40.206 tries to reach 220.127.116.11; L1 would encapsulate and forward it to L4, check out how Switch L1 formats the frame
Therefore, we have tunnel end points in 100.100.100.X and underlay 18.104.22.168X communicating with each other.
As the network grows, it becomes difficult to maintain and configure all static tunnel endpoint, therefore we can use multicast to scale and learn thousands of remote VXLAN tunnels. Let’s explore Mcast to build VXLAN tunnel endpoints.
Same topology, let’s explore using Mcast to learn tunnel endpoints!
- Assuming we have enabled multicast between spine and leaf topology.
- When 22.214.171.124 tries to reach 126.96.36.199, we need to establish tunnel endpoint between L1 and L4 (in our case 100.100.100.1 and 100.100.100.200)
- Below mcast capture shows how 100.100.100.1 would learn remote MAC address of L4 using destination multicast IP 188.8.131.52 and 01:00:5e:00:0058 Mcast MAC address.
It’s very important to note from above capture, Leaf L1 100.100.100.X network is using destination multicast ip 184.108.40.206 to learn remote MAC address of leaf L4…..as shown in above capture….please note outer MAC/IP headers.
4. Once L1 update L4 Mac address, then it would encapsulate data from 2.2.2.x network into 100.100.100.x, Please note the MAC address of outer header …..At this stage we can see leaf L1 learning(10:00:00:10:00:11) remote MAC address of L4….as shown below….Please check outer IP/MAC headers.
5. Below capture on leaf L1 shows arp reply coming from 220.127.116.11 (host attached to L4 with mac 00:00:10:01:00:01).
In nutshell, we can see outer IP mac address are learnt via Multicast, hence VXLAN tunnel being formed between L1 and L4 we can inner IPs (2.2.2.x) communicating to each other!
As network grows larger, it’s not great idea to flood and update MAC address to build tunnel endpoints. The next generation tunnel endpoints are constructed using MP-BGP
MP-BGP: EVPN to learn remote MAC address and construct VXLAN tunnel
The same old host would try to reach 18.104.22.168 from 22.214.171.124,
For EVPNs we have 5 different types control plan messages, that would help to solve different use case, Below Control plan message is type-2 and particularly used to carry underlay IP information. (In large scale network Spine/Leaf would know all VM prefix).
Lets look into brief EVPN-Type2 exchanges
- We assume leaf and spine are configured perfectly, IBGP session is up after BGP open message are exchanged between L1 and L4. Therefore, in this scenario when BGP session comes up that means L1 or L4 have learn respective MAC-address.
- MP-BGP EVPN type-2 (IP:MAC) are used to learn underlay IP prefix, These prefix in our case 126.96.36.199 and 188.8.131.52 attached to switch port.
- EVPNS: NLRI
Once we have IBGP session established, technically these MAC address would be used by switch to encapsulate VXLAN traffic, From below BGP update message we can see that 184.108.40.206/00:00:10:01:00:00 are leant by network using BGP
4. After learning control plan, when 220.127.116.11 communicates to 18.104.22.168, please find the below Wire-shark flow; Note we have outer destination mac address 10:00:00:10:00:11 was learn when IBGP session was established.
Overall, Its all about learning remote mac address to form VXLAN tunnels and use this information to encapsulate inner IP information. I believe the best way is to use multicast as its easy, Only problem is spine/leaf hardware switches will never learn these prefix.
In contrast in most spine/Leaf, we have IBGP session to form loop free topology, therefore it make sense to use evpns that would help to explore underlay prefix. In this way network engineer would know all overlay/underlay prefix and the same information could be used to built smart analytics.