MLow: Meta's low bitrate audio codec

At Meta, we support real-time communication (RTC) for billions of people through our apps, including WhatsApp, Instagram, and Messenger.
We are working to make RTC accessible by providing a high-quality experience for everyone – even those who might not have the fastest connections or the latest phones.
As more and more people have relied on our products to make calls over the years, we’ve been working on new ways to ensure all calls have a solid audio quality.
We’ve built the Meta Low Bitrate (MLow) codec: a new tool that improves audio quality especially for those on slow-speed connections.

Figure 1: Increasing complexity or bitrate usually improves quality, but good codecs achieve higher quality while balancing the other two.

RTC products use many building blocks to deliver the full experience, and one of the critical components is audio/video codecs. These codecs help compress the captured audio/video data so it can be sent across the internet efficiently to the recipient, keeping the experience real time. For example, the size of raw audio captured for a typical call is 768 kbps (mono, sampling at 48kHz, bit depth 16), which modern codecs are able to compress down to 25-30 kbps. Often this compression comes at the cost of some quality (loss of information), but good codecs can strike a balance among the trio of quality, bitrate, and complexity by exploiting deep knowledge about the nature of the audio signal as well as by using psychoacoustics.

Building a good codec is quite challenging, and that is why we don’t see new codecs emerging very often. The last widely known, good open-source codec was Opus, released in 2012, which has become the codec of choice for the wide variety of applications on the internet. Meta has used Opus for all its RTC needs, and so far it has served us well – helping to deliver quality calls to billions of users across the globe.

Our motivation for building a new codec

Given the massive scale of RTC usage in Meta products, we get to see how a codec performs in a range of network scenarios and how it impacts the end user’s experience. In particular, we’ve observed that a significant chunk of calls have poor network connections throughout or for part of a call. Typically a bandwidth estimation module (BWE) detects the quality of the network, and as the network quality degrades, we need to lower the codec operating bitrate to avoid congesting the network and keep the audio flowing – impacting the trio balance referenced above. Complicating matters, conducting a video call despite poor network quality leaves little room for audio and pushes the audio bitrate further down. The lowest operating point for Opus is 6 kbps, at which it runs in NarrowBand mode (0 – 4kHz) and does not adequately capture all the sound frequencies produced by human voices—and so doesn’t sound as clear or natural. Here is an example of how Opus sounds at 6kbps and the corresponding reference file for comparison.

Raw reference signal:

Opus @ 6 kbps NarrowBand (NB):

Over the last two years, we have seen development of some new machine learning (ML)-based audio codecs that provide good quality audio at very low bitrates. In October of 2022, Meta released Encodec, which achieves amazingly crisp audio quality at very low bitrates. While these AI/ML-based codecs are able to achieve great quality at low bitrates, it often comes at the expense of heavy computational cost. Consequently, only the very high-end (expensive) mobile handsets are able to run these codecs reliably, while users running on lower-end devices continue to experience audio quality issues in low-bitrate conditions. So the net impact of these newer computationally expensive codecs is actually limited to a small portion of users.

A significant number of our users still use low-end devices. For example, more than 20 percent of our calls are made on ARMv7 devices, and 10’s of millions of daily calls on WhatsApp are on 10-year-old-plus devices. Given the readily available codec choices and our commitment to ensure that all users – regardless of what device they’re on – have a quality calling experience, we clearly need a codec with very low-compute requirements that still delivers high-quality audio at these lowest bitrates.

The MLow codec

We broke ground with our development of a new codec in late 2021. After nearly two years of active development and testing, we are proud to announce Meta Low Bitrate audio codec, aka MLow, which achieves two-times-better quality than Opus (POLQA MOS 1.89 vs 3.9 @ 6kbps WB). Even more importantly, we are able to achieve this great quality while keeping MLow’s computational complexity 10 percent lower than that of Opus.

Figure 2 below shows a MOS (Mean Opinion Score) plot on a 1-5 scale and compares the POLQA scores between Opus and MLow at various bitrates. As the chart makes evident, MLow has a huge advantage over Opus at the lowest bitrates, where it saturates quality faster than Opus.

Figure 2: POLQA score comparing Opus (WB) versus MLow at various bitrates across a large dataset of files.

We have already fully launched MLow to all Instagram and Messenger calls and are actively rolling it out on WhatsApp—and we’ve already seen incredible improvement in user engagement driven by better audio quality.

Here are some audio samples for you to listen to. We suggest that you use your favorite pair of headphones to appreciate the striking audio-quality differences.

Opus 6 kbps NB	MLow 6 kbps WB	Reference

Being able to encode high-quality audio at lower bitrates also unlocks more effective Forward Error Correction (FEC) strategies. Compared with Opus, with MLow we can afford to pack FEC at much lower bitrates, which significantly helps to improve the audio quality in packet loss scenarios.

Here are two audio samples at 14 kbps with heavy 30 percent receiver-side packet loss.

Opus:

MLow:

Note that at these bitrates, Opus is not able to encode any inband FEC. It needs a minimum of 19 kbps to encode any inband FEC at 10 percent packet loss, which hurts the audio recovery.

MLow internals

MLow builds on the concepts of a classic CELP (Code Excited Linear Prediction) codec with advancements around excitation generation, parameter quantization, and coding schemes. Figure 3 is a high-level visual of how the codec works internally. On the left we have an input signal (raw PCM audio) feeding into the encoder, which then splits the signal into two low and high-frequency bands. Then, each band is encoded separately while making use of shared information to achieve better compression. All the output is passed through a range encoder to further compress and generate an encoded payload. The decoder does the exact opposite when given the payload to generate output audio signals.

Figure 3: High level MLow encoder and decoder architecture.

With these split-band optimizations, we are able to encode the high band using very few bits, which lets MLow deliver SuperWideBand (32kHz sampling) using a much lower bitrate.

What’s next?

MLow has greatly enhanced audio quality on low-end devices while still ensuring calls are end-to-end encrypted. We are really excited about what we have accomplished in just the last two years—from developing a new codec to successfully shipping it to billions of users around the globe. We’re continuing to work on improving the audio recovery in heavy packet loss networks by pumping out more redundant audio, which MLow allows us to do efficiently. We’re excited to share more as we continue working to make it easier for all our users to make quality audio calls.