Managing network solutions amidst a growing scale inherently brings challenges around performance, deployment, and operational complexities.
At Meta, we’ve found that these challenges broadly fall into three themes:
1.) Data center networking: Over the past decade, on the physical front, we have seen a rise in vendor-specific hardware that comes with heterogeneous feature and architecture sets (e.g., non-blocking architecture). On the software side, there has been a massive increase in scale and capacity demand (in the order of magnitude of MWs per physical building) to manage hyperscale architectures such as ours. Also, the pivot to metaverse has led to a significant increase in AI, HPC, and machine learning workloads that demand huge networking bandwidth and compute capacity and pose challenges around safe co-existence of existing web, legacy and modern workloads.
2.) WAN optimizations: Over the last few years, there has been a rapid increase in content creation fueled by a growing creator economy and hybrid and remote work, that has led to huge capacity and network bandwidth demands on the backbone networks.
3.) Operational Efficiency and Metrics Improvements: Traditional network metrics such as packet loss and jitter are too specific to the network/host and do not provide correlation between the application behavior and network performance.
At the recent Networking@Scale virtual conference in November 2022, engineers from Meta discussed these challenges and presented solutions across these themes that help bring better network performance than ever to people using our family of apps:
Developing, deploying, operating in-house network switches at a massive scale
Shrikrishna Khare, Software Engineer, Meta
Srikrishna Gopu, Software Engineer, Meta
FBOSS is one of the largest services in Meta and powers Meta’s network. The presenters Shrikrishna Khare and Srikrishna Gopu, talk about their experience designing, developing, and operating FBOSS: An in-house software built to manage and support a set of features required for data center switches of a large-scale Internet content provider. They present key ideas underpinning the FBOSS model that helped them build a stable and scalable network.
The presentation also introduced the Switch Abstraction Interface (SAI) layer that defines a vendor-independent API for programming the forwarding ASIC. The new FBOSS implementation was deployed at a massive scale to a brownfield deployment and was also leveraged to onboard a new switch vendor into the Meta infrastructure.
Wiring the planet: Scaling Meta’s global optical network
Stephen Grubb, Optical Engineer, Meta
Joseph Kakande, Network Engineer, Meta
Stephen Grubb and Joseph Kakande talk about the expansive global fiber network that is being built and managed by BBE (Backbone Engineering – which plans, designs, builds, and supports the global network that interconnects Meta’s data centers (DCs) and points-of-presence (POPs) to the internet), with special highlights on the submarine fiber optic systems that are being built to connect the globe.
This talk showcases Bifrost and Echo, which are the first networks to directly connect the US and Singapore and will support SGA, Meta’s first APAC data center. They also discussed the vast 2Africa project, which is both the world’s largest submarine cable network and has the potential to connect the largest number of people, 3 billion. The talk also covers the connection of our submarine networks to our terrestrial backbone and describes how Meta designs and builds the hierarchies of the optical transport layer built on top of those fiber paths. They also discuss In-house software system suites, solutions for distributed provisioning and monitoring of this global fleet of hardware, and approaches to diagnosis and remediation of network failures.
Milisampler: Fine-grained network traffic analysis
Yimeng Zhao, Research Scientist, Meta
Yimeng Zhao talks about radically improving the visibility, monitoring, and diagnosis of Meta’s planet-scale production network via innovations in traffic measurement tools.
Managing data center networks with low loss requires understanding traffic patterns, especially burstiness of the traffic, at fine time granularity. Yet, monitoring traffic with millisecond granularity fleet wide is challenging. To gain more visibility into our production network, Millisampler, a BPF-based, lightweight traffic measurement tool that operates at high granularity timescale was built and deployed in every server in the entire fleet at Meta for continual monitoring.
Millisampler data allows us to characterize microbursts at millisecond or even microsecond granularity. And simultaneous data collection enables analysis of how synchronized bursts interact in rack buffers. This talk covers the design, implementation, and production experience with Millisampler, as well as some interesting observations collected from the Millisampler data.
Network SLOs: Knowing when the network is the barrier to application performance
Brandon Schlinker, Research Scientist, Meta
Sharad Jaiswal, Optimization Engineer, Meta
At Meta, we need to be able to readily determine if network conditions are responsible for instances of poor quality of experience (QoE) such as images loading slowly or video stalling during playback. Brandon Schlinker and Sharad Jaiswal from Meta’s Traffic Engineering team, introduced the concept of Network SLOs, which can be thought of as a product’s “minimum network requirements’ for good QoE. They describe the approach and design in deriving Network SLOs via a combination of statistical tools and operationalizing them. They also described approaches to evaluate Network SLO compliance, and highlighted case-studies where these SLOs helped triage regressions in QoE, identify gaps in Meta’s edge network capacity, and surface inefficiencies in how product utilizes the network.
Improving L4 routing consistency at Meta
Aman Sharma, Software Engineer, Meta
Andrii Vasylevskyi, Software Engineer, Meta
Aman Sharma and Andrii Vasylevskyi talk about the design, development, use cases, and improvements in Layer 4 load balancing by developing a tool called Shiv. When a large number of backends are added or removed, remappings in the network routing tables occur, resulting in broken end-to-end connections and impacted user experience (e.g., stalled videos).
Shiv routes packets to backends using a consistent hash of the 5-tuple of the packet (namely, the source IP, destination IP, source port, destination port, and protocol). Shiv’s objective is to route packets for a connection (which all have the same 5-tuple) to the same backend for the duration of the connection and avoid connection breakage.