Jump-Start: Boosting VM performance

What the research is:

Jump-Start is a new approach for improving the performance of virtual machines at scale. Virtual machines are a modern and popular design to implement programming languages used to build applications in general, including large-scale websites like Facebook and Instagram. However, virtual machines incur well-known performance overhead in terms of the amount of memory and CPU resources required, particularly during the application’s warm-up phase, when the code is being profiled and translated from the virtual machine’s abstract language into real machine code by a just-in-time (JIT) compiler. We have developed the Jump-Start method to reduce virtual machines’ overhead during the warm-up phase.

Jump-Start has successfully been implemented in the HipHop Virtual Machine (HHVM), which powers not only Facebook.com but also many other sites across the web. HHVM Jump-Start has been deployed across our data centers, and our evaluation demonstrates that it reduces HHVM’s overhead by 54.9 percent during warm-up for Facebook’s apps and websites. Jump-Start has been used to improve our website’s performance by 5.4 percent in steady state, i.e., even after HHVM is warmed up. Jump-Start is not only the first technique to address the warm-up overhead of virtual machines at scale but also the first to boost steady-state performance.

How it works:

Like many advanced JIT compilers, in order to produce high-performance machine code, the HHVM JIT compiles the code twice: first to collect profile data about the application’s behavior, and then to produce optimized code leveraging that profile data. Although this approach results in much better steady-state performance, it also incurs significant warm-up overhead by compiling the code twice and waiting to collect profile data.

In order to greatly reduce HHVM’s warm-up overhead, Jump-Start leverages our phased rollout. To keep Facebook running smoothly and add new features, updates are made every few hours. Each time, our global fleet of web servers (running HHVM) is restarted in three phases. In the first phase (C1), we restart a very small fraction of the servers. In the second phase (C2), we restart approximately 2 percent of the servers, and finally, in phase C3, we restart the remaining servers. This phased rollout was designed to provide enough signal about the health of the new version being rolled out to allow us to halt the rollout, if necessary, before the update has been deployed to the entire server fleet.

Jump-Start leverages the phased rollout of Facebook to avoid some of HHVM’s overhead during warm-up. More specifically, the profile data collected by the servers in C2 is shared with all the servers in C3. This way, the vast majority of our web servers can skip both the compilation of profiling code and the execution of this code to collect profile data. Furthermore, because only a small portion of the server fleet experiences the JIT profiling overhead, Jump-Start allows for more thorough profiling of the application. This enabled us not only to improve the effectiveness of the previous HHVM’s profile-guided optimization but also add new optimizations that further improved HHVM’s steady-state performance.

Why it matters:

The Jump-Start technique has significantly improved HHVM’s warm-up and steady-state performance. These improvements have enabled the continuous deployment of Facebook, which both improves developers’ productivity and increases the speed at which new features are rolled out to the people who use Facebook. By improving HHVM’s performance during warm-up, Jump-Start also reduces the latency observed by the people who use Facebook and allows for seamless updates to the website. By improving HHVM’s steady-state performance, Jump-Start improves the efficiency and reduces the footprint of the server fleet that powers Facebook. Although this work describes and evaluates the Jump-Start technique only in the context of HHVM, this same approach can be used to improve the performance of other virtual machines.