This week saw the publication of RFC 7938 which proposes a practical routing design for use in large-scale data centres. Such data centres are typically comprised of over one hundred thousand servers which are increasingly seeing traffic being transferred from server-to-server rather than into and out of the data centres. This is typical of computing clusters using Hadoop (an open-source software framework for distributed storage and distributed processing), data replication between clusters, and virtual machine migrations.
When the majority of traffic is ingressing and egressing a data centres, traditional topologies are sufficient to accommodate such flows and if more bandwidth is required, then this can be added by scaling up the network line cards or replacing devices with higher port densities. This becomes more difficult when traffic increasingly transverses the data centre, and there’s a need for a more dynamic approach in order to simplify operations and network stability.
The RFC suggests that experimentation and extensive testing have shown that External BGP is well suited as a standalone routing protocols for these types of data centre applications. As well as presenting an overview of network design requirements and considerations, it also advances arguments for selecting eBGP and dispels some the traditional perceptions about the suitability of BGP for such applications, along with its advantages over traditional Interior Routing Protocols.
Plenty of examples included in this document, so well worth reading if you’re running a data centre. You can also check out the presentation at NANOG that explains the background to this.