Building applications and services that scale to hundreds of millions or even billions of people around the globe often brings with it an unprecedented set of challenges in the domain of data processing systems. More people and more engagement usually leads to more data, more updates, and more queries. How growing companies handle this can make or break the experience people have with their products.
Building on that, the rich variety of consumer experiences today introduces new data models and new query and update patterns — and this in turn has an impact on both online data processing systems and analytics systems. The growth of mobile clients and applications adds on yet another dimension: by providing stateful devices with variable connectivity, client interactions with the backend data systems need to be designed and optimized differently than they would be for the web.
The Data track at the @Scale 2014 conference brought together the community that is tackling these problems, with talks from Box, Facebook, Netflix, Pinterest, and YouTube. Check out the talks to hear about how these companies are pushing the boundaries in every layer of the stack across cache, database, logging, and analytics systems.
Presentation Summary and Videos
Growing Facebook on Mobile, a Real-Time Analytics Story
In this session, Facebook engineer Weizhe Shi started out by providing context on the analytics ecosystem at Facebook before this project began — the team had tools to rapidly iterate on their web product but didn’t have that same ability for mobile, even though a large number of people had started accessing Facebook through mobile devices. By combining limited releases with a real-time analytics pipeline, he explained how they filled that gap. He detailed the challenges faced in collecting analytics data on mobile and the frameworks built to overcome them. Anshul Jaiswal followed up with the challenges faced in processing and analyzing this data. He presented the framework they built for rapid evolution of analytical tools to understand the health of mobile releases in real time. Will Wirth covered specific examples of how this framework was used to make Facebook's mobile applications stable, performant, and engaging.
Data Platform Architecture, Evolution, and Philosophy at Netflix
Virtually all of Netflix’s data infrastructure is in the AWS cloud, and S3 is the central data hub. Netflix’s Kurt Brown delved into the framework of Netflix’s data platform in S3 and how the various technologies work together. He also previewed many open source tools, including Inviso, a job search and visualization tool, and Kragle, a big data API. Justin Becker wrapped up the talk with a deep dive into Mantis, a reactive stream processing system that provides real-time insights to ensure that Netflix works for every user every time across many different dimensions (e.g. device, location, bitrate).
Zen: Pinterest’s Graph Storage Service
The session opened with Raghavendra Prabhu, engineering manager at Pinterest, who emphasized the importance of building infrastructure abstractions to enable product engineers to iterate very quickly. He then introduced Zen, which provides a graph data model abstraction on top of a database to provide storage as a service for various products at Pinterest. Xun Liu closed the talk with an explanation of Zen internals and production learnings.
Building Scalable Caching Systems via mcrouter
Facebook engineer Rajesh Nishtala announced the open sourcing of “mcrouter,” a memcached protocol router for scaling cache deployments. It manages all traffic to, from, and between thousands of cache servers in dozens of clusters in Facebook data centers. He provided an overview of Facebook's caching infrastructure — which processes more than 4 billion operations per second — and deep-dived into a few specific scalability and reliability features of the technology. Ricky Ramirez of Reddit wrapped up the talk by previewing how mcrouter is used at Reddit, which automatically scales between 170 and 300 app servers every day. Check out the blog post for more details on mcrouter.
Structured Data at Box: How We’re Building for Scale
With over 6 billion third-party API calls per month, Box has to cater to a diverse set of users. Senior Engineering Manager Tamar Bercovici detailed the unique set of challenges in building infrastructure for the enterprise cloud, such as availability and consistency of the system, and how Box has tackled some of these issues with Tron, which provides a proxy to memcached servers, and Credence, a new scalable relational data store.
Facebook’s A/B Platform Interactive Analysis in Real Time
Itamar Rosenn from Facebook summarized how the company conducts A/B tests using an automated analysis tool called Deltoid. He then previewed the next generation of Deltoid, which provides real-time A/B test results and enables interactive slice-and-dice exploration of the data.
MySQL for Messaging
Harrison Fisk detailed Facebook’s new mobile-oriented backend for Messenger. The system, called Iris, uses a queueing design and tiered storage, with MySQL providing the new flash layer. To use MySQL, several new MySQL 5.6 features were deployed, including parallel replication and semi-synchronous replication. More details of the system are available at this blog post.
Scaling YouTube’s Backend: The Vitess Trade-Offs
YouTube’s Sugu Sougoumarane concluded the day with a talk on Vitess, a scalable open source storage solution based on MySQL, that YouTube developed to make data storage manageable as the service scaled. He discussed a number of the system's features, such as the Vitess lock server, which houses global information vital to all servers, and vttablet, which turbocharges MySQL and rewrites problematic queries.
This wraps up the last of our recaps from the @Scale 2014 event we held in September. Check back here for more news about our @Scale conference series.