Making ARP suppression work with pure L2 EVPN with Linux and FRR Link to heading
In my previous post, I described the issues of making ARP suppression work with an L2 only EVPN topology, with Linux and FRR.
Here I am going to describe how a small process running on each leaf can fix the problem.
The problem Link to heading
Here I’ll quickly summarize the problem. Given a topology like the following:
┌────────────────────────────────┐
│ │
│ │
│ │
┌───────────┐ ARP │ ┌───────────────┐ │
│ │ ├──────┐ │ │ │
│ host2 │◄────┤ eth1 ├────┤ br-10 │ │
│ │ ├──────┘ │ │ │
└───────────┘ │ └──────────▲────┘ │
│ │ │
│ │ │
│ │ │
│ leaf2 │ │
└──────────────────────┼─────────┘
│
│ VXLan
│
┌──────────────────────┼─────────┐
│ │ │
│ │ │
│ │ │
┌───────────┐ARP │ ┌──────────┴────┐ │
│ │ ├──────┐ │ │ │
│ host1 ├────►│ eth1 ├────┤ br-10 │ │
│ │ ├──────┘ │ │ │
└───────────┘ │ └───────────────┘ │
│ │
│ │
│ │
│ leaf1 │
└────────────────────────────────┘
- being unicast, the ARP replies won’t land on
leaf1
leaf1
’s neighbor table is never filled- the Zebra instance on the
leaf1
endpoint won’t learn the MAC / IP association - FRR won’t send the MAC / IP association to
leaf2
as type 2 EVPN route - ARP suppression on
leaf2
won’t work, and every ARP request is going to be forwarded through the fabric
Filling the neighbor table Link to heading
In my previous post, I explained how the root cause of ARP suppression not working
was the fact hat leaf1
neihgbor table wasn’t getting filled, and I showed how pinging host1
from leaf1
would fill leaf1
’s neighbor table, making all the machinery effectively work.
Additionally, this issue comment on the FRR github repo explains how cumulus Linux solves this problem by listening at ARP requests and messing up with the local neighbor table.
eBPF to the rescue! Link to heading
Ebpf is not as cool as AI, but can certainly be more effective in this scenario. The plan is simple: write a program that leverages eBPF to listen to ARP requests / replies and fills the neighbor table from user space.
┌────────────────────────────────┐
│ │
│ ┌────────────┐ │
│ │ neigh table│ │
│ │ │ │
│ │ │ │
│ │ │ │
│ │ │ │
│ ┌─────────┐ └────────────┘ │
│ ┌►│ ebpf │ ▲ │
│ │ └─────┬───┘ │ │
│ │ │ ┌─────┴─────┐ │
│ │ └──────►│ userspace │ │
│ │ └───────────┘ │
┌───────────┐ARP │ │ │
│ │ │ │ ┌───────────────┐ │
│ host1 ├────►├──┴───┐ │ │ │
│ │ │ eth1 ├────┤ br-10 │ │
└───────────┘ ├──────┘ │ │ │
│ └───────────────┘ │
│ │
│ │
│ leaf1 │
└────────────────────────────────┘
The poc can be found on my github repo.
The eBPF part Link to heading
The eBPF part is pretty simple (omitting all the checks to make the verifier happy):
__u16 hlen = arph->ar_hln;
__u16 plen = arph->ar_pln;
__u32 opCode = bpf_ntohs(arph->ar_op);
void *senderHwAddress = (void *)arph + sizeof(struct arphdr);
void *senderProtoAddress = senderHwAddress + hlen;
struct event *arp_event;
arp_event = bpf_ringbuf_reserve(&events, sizeof(struct event), 0);
bpf_core_read_str(arp_event->senderHWvalue, hlen + 1, senderHwAddress);
bpf_core_read_str(arp_event->senderProtoValue, plen + 1, senderProtoAddress);
arp_event->opCode = opCode;
bpf_ringbuf_submit(arp_event, 0);
It gets the sender IP and MAC address from the ARP request, fills an event and sends it to user space via a ring buffer.
Simple and bulletproof.
On user space Link to heading
The user space side is simple as well (with purged error handling):
for {
record, err := rd.Read()
var event arpEvent
binary.Read(bytes.NewBuffer(record.RawSample), binary.LittleEndian, &event)
ip := bytesToIP(event.SenderProtoValue)
mac := bytesToMac(event.SenderHWvalue)
when := added[ip]
now := time.Now()
if now.Sub(when) > time.Minute {
added[ip] = now
neighborAdd(devID.Index, ip, mac)
}
}
It:
- receives the ARP related information from Kernel space
- checks if it received one in the last minute
- if not, adds the record to the neighbor table
Does it work? Link to heading
Let’s try with the same layout presented in the l2 evpn post:
When we start, the neigh table is does not contain the entries related to HOST1
or HOST2
:
docker exec -it clab-evpnl2-leaf1 ip neigh show
192.168.1.0 dev eth1 lladdr aa:c1:ab:94:bd:76 REACHABLE
fe80::a8bb:ccff:fe00:64 dev br10 lladdr aa:bb:cc:00:00:64 extern_learn NOARP proto zebra
We run the program on both the leaves:
leaf1:/# ./fill-neighbor --attach-to eth2 --from-interface br10
leaf2:/# ./fill-neighbor --attach-to eth2 --from-interface br10
And then we ping HOST2
from HOST1
:
docker exec clab-evpnl2-HOST1 bash -c "ping -c 1 192.168.10.3"
PING 192.168.10.3 (192.168.10.3) 56(84) bytes of data.
64 bytes from 192.168.10.3: icmp_seq=1 ttl=64 time=0.616 ms
--- 192.168.10.3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.616/0.616/0.616/0.000 ms
Let’s check the neighbor table of leaf1:
docker exec -it clab-evpnl2-leaf1 ip neigh show
192.168.10.2 dev br10 lladdr aa:c1:ab:3b:a1:fd STALE
192.168.1.0 dev eth1 lladdr aa:c1:ab:94:bd:76 STALE
192.168.10.3 dev br10 lladdr aa:c1:ab:69:b5:9e extern_learn NOARP proto zebra
fe80::a8bb:ccff:fe00:64 dev br10 lladdr aa:bb:cc:00:00:64 extern_learn NOARP proto zebra
leaf1
’s neighbor table now contains:
- an entry related to
HOST1
(192.168.10.2), filled by the localfill-neighbor
as an effect of the ARP request coming through eth2 - an entry related to
HOST2
, becauseleaf2
’s local table was filled by itsfill-neighbor
as an effect of the ARP reply coming through itseth2
, and then FRR announced it as type 2 EVPN route toleaf1
Arping is resolved locally to leaf2, suppressing ARP broadcasts around the fabric.
Things to improve Link to heading
This is a simple example, that clearly needs to be extended. It lacks IPV6 support, and more importantly the table must be kept in sync in two ways:
- by pinging the neighbor, to understand if the neighbor is still there and to refresh the local neigh table
- by purging the entry of the map in case the neighbor is not there anymore
Conclusion Link to heading
This is yet another example of how eBPF programs can be very simple but enable powerful features of the kernel. Here we just observe (and parse) the ARP requests, and use this information to keep the neighbor table using netlink commands.