As a founding member of the Open Compute Project (OCP) community, Facebook embraces a collaborative approach to solving industry-wide problems. One of the more pressing problems facing the industry today is how to stay ahead of increasing power demands from artificial intelligence and networking. To address this concern, we are announcing a new initiative to drive convergence and build uniformity around the Rack & Power design. Over the past few months, we have been working to develop a new architecture based on Open Rack with input from Microsoft. This initiative has several goals:
- Enable greater sharing between Microsoft and Facebook through a common OCP rack architecture
- Provide a flexible frame and power infrastructure capable of supporting a wide range of solutions from across the OCP community
- Enable additional features beyond those needed by Facebook for the larger community, such as physical security for solutions deployed in co-location facilities
- Enable new thermal solutions now under development by the Advanced Cooling Solutions sub-project, including:
- Liquid cooling manifolds with either manual or hot-plug interfaces
- Door-based heat exchangers with defined attachment points and thermal characteristics
- Defined physical and thermal interfaces between cooling solutions and the DC infrastructure
- Develop power and battery backup solutions that scale across different rack power levels while accommodating different power input types
Why a new version of Open Rack?
As the industry turns to AI and ML to solve more difficult problems, the systems that are used to drive these solutions are increasing the power density at both the component level and the system level. Ever-increasing bandwidth speeds are driving similar issues with networking systems. At the component level, we are seeing power densities of a variety of processors and networking chips that will be beyond the ability of air to cool in the near future. At the system level, AI hardware solutions will continue to drive higher power densities. It’s a common approach in these sorts of systems to get memory, processors, and system fabrics as close together as possible to improve overall system performance.
On the power delivery side, processing silicon trends point to systems consuming an ever-larger percentage of the total power in data centers. Given a limited data center power budget, efficiency becomes increasingly important to prevent these systems from displacing server and storage systems.
As several members of the OCP community have shown, 48V power delivery in Open Rack has certain advantages over 12V distribution:
- Distributing power within the rack (between the power shelf and the IT gear) is more efficient at 48V than 12V due to the reduction in electrical resistance.
- Most network systems are already capable of supporting 48V input power, since they are widely deployed in traditional telecom spaces.
- New GPU and ML technologies with support for native 48V are emerging.
- Very fast changes in current draw from power-dense systems (dI/dt) are much easier to solve with 48V.
- At the PWA level, 48V power distribution allows for increased density because connectors, voltage regulation components, and copper planes are all smaller.
As we face power density challenges at both the component and system levels, liquid cooling is a logical solution. Liquid cooling has been used successfully in computing systems for decades for similar engineering reasons. Last year, OCP announced the creation of the Advanced Cooling Solutions sub-project within the Rack & Power project. The community wants the advantages of the these cooling solutions with the price, reliability, and adaptability required for large-scale deployments. Community members are developing specifications in three different areas:
- Direct contact liquid loops
- Door-based heat exchangers
- Fully immersed systems
We believe the next generation of Open Rack will provide even greater benefits to the OCP community than the current version (Open Rack V2). The flexible and reliable architecture in V2 reduces the cost, weight, volume, and thermal complexity caused by the power systems of traditional EIA white box IT gear. For this next version, we are collaborating to create flexible, interoperable, and scalable solutions for the community through a common OCP architecture. Accomplishing this goal will enable wider adoption of OCP technologies across multiple industries, which will benefit operators, solution providers, original design manufacturers, and configuration managers.
We have released the first pass of the frame specification to the community for review and aim to have prototypes later this year. Facebook and Microsoft encourage other OCP community members to join these subgroups as we develop the specifications that will drive the frame and power components. Now is the time to get involved.
These solutions will be driven by several different OCP initiatives.
Rack & Power project:
- Converged rack frame
- Flexible power shelf
- Universal AC power interconnect
- Pluggable DC power shelf output interconnect
- Battery backup systems
Advanced Cooling Solutions subproject:
- Direct-contact manifold
- Manual and hot-pluggable drip-less valves
- Door-based heat exchangers