-
Notifications
You must be signed in to change notification settings - Fork 18
Mux data plane
BIRD installs routes from all clients on table 20000
. Packets
received on the upstream interface are routed according to this
table by an ip rule
(with priority 20000
).
We create one macvlan interface (called upstream{peer}
) for each
upstream network. These macvlans are attached to the OpenVPN tunnel
and directly accessible by clients. Each macvlan gets assigned IP
addresses in the following format:
100.{64+mux}.{upstream >> 8}.{upstream % 256}
2804:269c:ffff:{mux}::{upstream}
BIRD establishes upstream sessions and populates one routing table
for each upstream network. Routes from upstream i
are installed
in table 10000+i
. BIRD sets the bgp_next_hop
on routes from
upstream i
announced to clients to the IP address on interface
upstream{i}
. When clients want to send packets to upstream i
,
ARP/NDP resolve MAC address on the macvlan interface. (The main
trick is to let the mux know which upstream network to use based on
the destination MAC on packets received from clients.) Packets
arriving on macvlan interface upstream{i}
are routed to routing
table 10000 + i
by an ip rule
.
-
Mux and upstream identifiers are global, so there is no collision of macvlan IP addresses among different upstreams.
-
Normally, Linux would reply to ARP/NDP requests for the IP addresses on the macvlan interfaces using the MAC address of the OpenVPN
tap
interface (where the ARP packets arrive). To force the kernel to reply the ARP requests with the macvlan's MAC, we setnet.ipv4.conf.{all,default}.arp_ignore = 1
andnet.ipv4.conf.{all,default}.arp_announce = 2
. NDP works as expected (although this was not tested without the above two configurations).
Each mux has a tap0
interface where all OpenVPN tunnels terminate.
To make all the above macvlan addresses reachable to the client, the
OpenVPN server at the mux tells clients to route the entirety of
100.{64+mux}.0.0/16
and 2804:269c:ffff:{mux}::/64
through the
tunnel. If a client connects to multiple muxes, each tunnel will be
attached to different /16
s and /64
s.
Muxes and clients are allocated IP addresses on the upper half of
prefixes, i.e., 100.{64 + mux}.128.0/17
and
2804:269c:ffff:{mux}:0:1::/80
. (Note that upstream macvlans are
allocated IP addresses on the lower half of the prefixes, i.e.,
100.{64 + mux}.0.0/17
and 2804:269c:ffff:{mux}:0:0::/80
. Note
also that IPv4 limits the maximum number of peers the platform can
support to approximately 2^15.)
The mux is always the first address on the upper half:
100.{64 + mux}.128.1
and 2804:269c:ffff:{mux}:0:1::1
. Clients
are allocated subsequent addresses by DHCP. If a client established
multiple simultaneous connections to a mux (all previous instances
of this happened by accident), each connection will have a separate
tunnel and be be allocated different IP address.
The first 24
inside 100.127.0.0/16
is allocated to AL2S interfaces.
Muxes connected to AL2S use IPv4 address 100.127.0.{{ id }}
Muxes connected to AL2S use IPv6 address 2804:269c:ff01::{{ id }}
.
For the IP addressing for remote upstreams, we reserved this IP
range in order to create a unique upstream address for each remote
upstream in the format: 100.126.X.X/16
, where X.X
is the same
formula as for the local upstream (peer.id
). For IPv6, we have
reserved ff02/48
. Only the MUX which connects to that specific
peer will have the IP address added to their sub-VLAN interface.
We use a veth
pair to connect user containers running on muxes to
the bird/openvpn
network namespace. These veth
s use a /30
prefix
(the .0
address is the network address, the .1
is the mux's end
of the veth
pair, the .2
is the container's end of the veth
pair,
and .3
is the broadcast address). The /30
is generated as
100.125.{M<<3}.{E*4}/30
, where M is the mux id and E is the experiment's
container id. For v6 we use 2804:269c:ff03:M::E:0/112
, with similar
association of IPs and interfaces.
- The authoritative information about addressing information is
settings.py
on the website. Several constants starting withSRVMGR_
define the output of the Python code and Jinja2 templates.