This week we’re publishing a series of posts looking back at the technologies and advancements Facebook engineers introduced in 2017. Our previous installments focused on data centers and connectivity.
We know that many people find communities on Facebook based on common geography, interests, or causes they care about. One way that Facebook is helping people build these connections is by breaking down language barriers with machine translation.
Providing fast, fluent translations is a complex challenge. People use dozens of languages on Facebook, and more than 50 percent of the 2 billion people on Facebook speak a language other than English. This adds up to more than 2,000 translation directions, and our translation models have to account for the nuanced ways people use language, including slang, abbreviations, context, intent, even typos. This year, we evolved our translation models from a phrase-based system to one underpinned by neural networks, which take into account the entire context of a sentence and provide more accurate translations. Our neural net system powers more than 4.5 billion translations every day on posts and comments, helping people connect with content that isn’t written in their preferred language. The Facebook Artificial Intelligence Research and Applied Machine Learning teams are continuing their work in this space, and this year they demonstrated neural nets that achieve even higher translation accuracy and at faster speeds.
We’ve also been working to improve computer vision capabilities toward the goal of understanding visual content at the pixel level. We’ve applied this technology to automatic alt text on photos to provide more descriptive image captions for people who are visually impaired, and announced a better image search system that leverages content understanding to surface the most relevant photos quickly and easily.
Advances in computer vision also enabled us to use artificial intelligence as a creative tool in developing augmented reality and virtual reality experiences. Our Applied Machine Learning team collaborated with a San Francisco-based artist to build and deploy an AR experience that used SLAM (simultaneous localization and mapping) and standard smartphone cameras to overlay digital versions of her artwork on real-world objects and scenes. The result was a virtual art installation that conforms to the environment, showing the creative potential of a technology that’s most often associated with driverless cars.
Last year, we introduced the concept of style transfer — using AI to apply a set of artistic qualities to images and videos in real time. This year, we took that capability to a new level and applied style transfer to 4K 360 video for a VR film. Our style transfer algorithm was originally designed to fit on a mobile phone, but the computational memory and time requirements for stereoscopic VR dictated a new approach. What started out as an experiment to test the limits of AI as a creative tool turned out to be a resounding success — the film premiered at the Tribeca Film Festival in May and won the Special Jury Prize at the Paris Virtual Film Festival. It’s now available on the Oculus Store for both Rift and Gear VR.
We’re always looking for ways to make 360 content seamless and easily accessible, especially under the hardware constraints of mobile devices. This year, we introduced several end-to-end optimizations for the 360 video experience. An evolution of our 360 geometry — offset cube maps — allows us to switch resolutions depending on viewing and networking conditions, decreasing bit rates by 92 percent. Tweaks to how we calculate the effective quality of hundreds of possible streams under a variety of conditions reduced buffering interruptions by 80 percent.
We enhanced the streaming experience further with view prediction models based on gravitational physics and AI, which were better than previous models at knowing where to deliver the highest concentration of pixels. They improved resolution by up to 39 percent. We also began testing a content-dependent streaming technique that processes every frame in a video to determine the points of interest in a scene and streaming the highest resolution in that direction.
The full 360 experience includes immersive audio. This year, we announced an audio system that supports spatial audio and head-locked audio simultaneously — an industry first. The spatial audio system responds to the direction a person is looking, so they hear sound behind them if that’s where it’s happening in the scene, while the head-locked system keeps audio elements like narration and music static relative to the viewer.
Our goal is to bring exciting new experiences like these to everyone as quickly (and reliably!) as possible. We do this by building efficient and scalable software tools that let our engineers move fast and ship high-quality code. In our next installment, we’ll recap the improvements we’ve made in our software tooling across the stack to help engineers be productive at Facebook and beyond.