Data @Scale - Boston Recap - Engineering at Meta

Last week, we hosted Data @Scale – Boston, our first @Scale conference outside of California. We had awesome talks from ConstantContact, DataXu, Facebook, HP Vertica, Infinio, Oracle, Tokutek, TripAdvisor, Twitter, VoltDB, and Wayfair, and brought together 175 engineers from the Boston tech community.

Building applications and services that scale to millions or even billions of people presents a complex set of engineering challenges, many of them unprecedented. The @Scale community is focused on bringing people together to openly discuss these challenges and collaborate on the development of new solutions.

Presentation Summary and Videos

Welcome

Featuring: Ryan Mack, Boston Site Lead at Facebook

Twitter: Scaling Crashlytics answers – Real time high-volume analytics processing w/ the lambda architecture

Featuring: Ed Solovey, Staff Software Engineer at Twitter

Description: In the fifteen seconds that it takes you to read this, Answers will have processed 12,000,000 events in support of its actionable, insightful, and real-time mobile analytics dashboards. Learn how it leverages the lambda architecture and probabilistic algorithms to handle this influx of information. We’ll dig into how the two pillars of the lambda architecture – off-line, batch processing and real-time, stream-compute processing come together to help us achieve a scalable, fault-tolerant, real-time data processing system.

Facebook: Lookback videos

Featuring: Goranka Bjedov, Capacity Engineer at Facebook

Description: On February 4th 2014, Facebook celebrated its 10th anniversary by releasing the Look-Back Videos product. Every Facebook user was given a 62 second video of their most important events over the course of their Facebook presence. If the users didn’t have a lot of activity, a cover page was generated instead. As reported in the press and on Facebook engineering blog, this project was realized in less than a month worth of time.

This talk will focus on discussing technical challenges related to the project, some of which include: compute, network, storage, distribution, projections and modeling. But, more importantly, the talk will focus on which parts of infrastructure enabled successful undertaking on a project of this size. What are the infrastructure pieces that had to be in place to make this happen? What are the necessary parts that enable fast product development on a massive scale, while at the same time keeping the risk to the remainder of the service acceptable? How can you plan and execute a project of this magnitude and how do you mitigate risks?

Vertica: DBD: An Automated Design Tool for Vertica

Featuring: Vivek Bharathan, Software Engineer at HP Vertica

Description: Query performance in any database system is heavily dependent on the organization and structure of the data. The task of automatically generating an optimal design becomes essential when dealing with large datasets. Vertica is a distributed, massively parallel columnar database that physically organizes data into projections. Projections are attribute subsets from one or more tables with tuples sorted by one or more attributes, that are replicated on or distributed across cluster nodes.

The key challenges involved in projection design are picking appropriate column sets, sort orders, cluster data distributions and column encodings. In this talk we shall discuss Vertica’s customizable physical design tool, called the Database Designer (DBD), that is tasked with producing projection designs that are optimized for various scenarios and applications. In particular, we will focus on the challenges that the DBD faces, and its evolution over the years.

Wayfair: Scaling Redis and Memcached at Wayfair

Featuring: Ben Clark, Chief Architect at Wayfair

Description: At Wayfair, we had to take the caching layer for our customer-facing web sites from a simple primary/secondary pair of Memcached nodes in 2012, to a set of consistently hashed clusters of in-memory cache servers and persistent key-value stores, in multiple data centers, in time for the holiday rush of 2013. Building on the work of giants and innovators, particularly Akamai, Last.fm, and Twitter, we used composable tools: Memcached, Redis, Ketama, Twemproxy, Zookeeper, to create a resilient distributed system. It’s big. Well. That’s always relative. Maybe that’s too bold a claim, considering some of the others speakers at this conference. Let’s say it seems big to us, and we’ve been through some explosive growth over the last few years! It’s definitely inexpensive, strong, and fast, and Ben Clark will describe our techniques and add-ons, which are available on github, and explain how to do it yourself.

TripAdvisor: Benefits of Big Data: Handling Operations at Scale

Featuring: Don O’Neill, VP of Engineering, Operations and Infrastructure at TripAdvisor

Description: Don O’Neill from TripAdvisor presents Big Data business lessons learned from handling operations on a site with over 280 million unique visitors every month – discussing Hadoop, log shipping and analytics, new operations monitoring, and anomaly detection.

Facebook: Geo-spatial Features in RocksDB

Featuring: Igor Chanadi, Database Engineer at Facebook; and Yin Wang, Research Scientist at Facebook

Description: RocksDB is an embeddable key-value store optimized for fast storage. It is based on Log-Structured merge-tree (LSM) architecture and is widely used across Facebook’s services. In this talk we’ll present how we implemented spatial indexing on top of RocksDB’s LSM architecture, which enables us to efficiently store geo-spatial data in RocksDB. We’ll also discuss how we optimized the spatial indexing for bulk-load, read-only and read-mostly workloads. Finally, we’ll talk about how we use the geo-spatial features to build database serving OpenStreetMaps data, which can then be used to render map tiles using Mapnik.

Vertica: Vertica Live Aggregate Projection

Featuring: Nga Tran, Software Engineer at HP Vertica

Description: Live Aggregate Projections (LAPs) are a new type of projection, introduced in HP Vertica 7.1, that contain one or more columns of data that have been aggregated from a table. The data in a LAP are aggregated at load time, thus querying it is several times faster than applying the aggregation directly on the table.

Unlike other databases which do not support incremental maintenance of non-distributive aggregate functions (e.g. MIN and MAX) for updates and deletes, incremental maintenance is possible in Vertica because the data in LAPs are *partially* aggregated.

This talk focuses on the implementation of Vertica LAPs, for which the current supported aggregate functions are (distributive) SUM, COUNT and (non-distributive) MIN, MAX, and TOP-K. By aggregating data per load on each partition, Vertica allows incremental maintenance on both distributive and non-distributive aggregation while allowing data to be deleted.

Facebook: Data @ Scale Building Scalable Caching Systems With Mcrouter

Featuring: Anton Likhtarov, Software Engineer at Facebook

Description: Modern large scale web infrastructures rely heavily on distributed caching (e.g memcached) to process user requests. These caches serve as a temporary holding spot for the most commonly accessed data. However, this makes these services very sensitive to cache performance and reliability. Mcrouter is the lynchpin of Facebook’s caching infrastructure. It handles the basics of routing requests to the appropriate hosts and managing the responses in a highly performant way. In addition, there are a lot of features in mcrouter that have been designed to dramatically improve the reliability of the caching infrastructure. The problems that mcrouter addresses are not specific to Facebook, but distributed caching systems in general. As a result, Instagram and Reddit have also adopted mcrouter as the primary communication layer to their cache tiers. Mcrouter is open source software and we hope it will be useful in many other applications that rely on caching. This talk gives a very brief overview of mcrouter and the basics of integrating it into different pieces of infrastructure.

F4 – Photo Storage at Facebook

Featuring: Joe Gasperetti, Production Engineer at Facebook; and Satadru Pan, Software Engineer at Facebook

ConstantContact: Scaling Cassandra and MySQL

Featuring: Stefan Piesche, CTO at Constant Contact

Description: CTCT used to scale data vertically in large DB2 databases attached to even larger SANs. Since this is not only cost prohibitive but poses significant scalability and availability issues, we have now 2 primary other data strategies.

Cassandra. We use Cassandra as a horizontally scalable data tier for key/value type data. We have around 350 Cassandra nodes spanning 2 data centers. That systems provides 10x the performance of the old RDBMS and 1/10th of the cost. This system is our consumer event tracking systems that scales to 100TB of data, 150BN records that arrive at a velocity of 10k/sec.

Sharded mysql. Our largest deploy is a 36TB system spanning 2 data centers. But, instead of just sharding the DB tier, we even shard the application tier using that system in order to provide complete transparency of the sharding mechanism. Our SOA allows for RESTful access of that data, without any knowledge of the underlying sharding mechanism. However, we have learned that this led to a substantial underutilization of the app tiers – a 96 node cluster of a Ruby Rails application – so we are looking into proprietary DB level sharding mechanisms as well.

The mixture of RDMBS and NOSQL data tiers has caused issues in our analytics platform, a 150TB Hadoop cluster. We use similar mechanism like Netflix does to read data from Cassandra nodes – reading from the SSTables to extract the data.

VoltDB: H-Store/VoltDB architecture vs. CEP systems and newer streaming architectures

Description: In 2007, researchers at MIT, Brown and Yale set out to build a new kind of relational database called H-Store. Commercially developed as VoltDB, it was suddenly possible to build applications that did millions of transactional operations per second at very low cost and with high fault tolerance. While suitable for micro-payments and other high volume, traditional transactional work, many early customers built systems for stream processing. As the product evolved, more and more features were added to support streaming, event processing and ingestion workloads, including materialized views, Kafka ingestion and push-to-HDFS data migration. This talk will explain, through customer use-cases and some development backstory, how the H-Store/VoltDB architecture compares to CEP systems and newer streaming architectures like Storm and Spark Streaming.

Facebook: Using Graph Partitioning in Distributed Systems Design

Featuring: Alon Shalita, Software Engineer at Facebook; and Igor Kabiljo Software Engineer at Facebook

Description: Large graph datasets, like online social networks or the world wide web, introduce new challenges to the field of systems design. Their size requires scaling resources horizontally by splitting data and queries across several computation units, but standard sharding and routing schemes that ignore the inherent graph structure of the datasets result in suboptimal performance characteristics. In this talk, we present an efficient distributed algorithm for graph partitioning, the problem of dividing a graph into equally sized components with as few edges connecting these components as possible, and show how its results can be used for optimizing distributed systems serving graph based datasets.

Infinio: Making Enterprise Software That’s As Easy To Install As Dropbox

Featuring: Martin Martin, Software Architect at Infinio

Description: There’s a lot of great software that does cool stuff, but when it comes to software that is deeply embedded in your infrastructure, all too often it’s too much of a project to deploy and try it out. We confronted the problem of intercepting and modifying the data stream between all the virtual machines in a data center and the backend storage arrays that host their virtual disks. This talk describes how networking works at the TCP and link levels, and how we subvert that to make installation so easy and nondisruptive that you could try out our product over lunch.

Facebook: Scalable Collaborative Filtering on top of Apache Giraph

Featuring: Maja Kabiljo, Software Engineer at Facebook

Description: Apache Giraph is a highly performant distributed platform for doing graph and iterative computations. Collaborative filtering is a well known recommendation technique that is often solved with matrix-factorization based algorithms. This talk will detail our scalable implementation of SGD and ALS methods for collaborative filtering on top of Giraph. We will describe our novel methods for distributing the problem and the related Giraph extensions that allows us to scale to over a billion people and tens of millions of items. We will also review various additions required for handling Facebook’s data (for example, implicit and skewed item data). Finally, to complete our easy to use and holistic approach to scalable recommendations at Facebook, we detail our approach to quickly finding top-k recommendations per user.

Tokutek: Fractal Tree Indexing in MySQL and MongoDB

Featuring: Tim Callaghan, VP of Engineering at Tokutek

Description: As transactional and indexed reporting data sets continue to grow, traditional B-tree indexing struggles to keep up, especially when the working set of data cannot fit in RAM. Fractal Tree indexes were purpose built to overcome this limitation, while retaining the read properties we expect for our queries. We’ll start by covering the theoretical differences between the two indexing technologies. We’ll end the talk by discussing the benefits that Fractal Tree indexes bring to MySQL (TokuDB) and MongoDB (TokuMX). “Benchmarks or it doesn’t count”, so expect to see a few.

Facebook: Cold Storage at Facebook

Featuring: Ritesh Kumar, Software Engineer at Facebook

Description: Cold Storage is an internally used Exabyte scale archival storage system developed completely in-house at Facebook. We discuss some of the salient design features of the cold storage stack and how it fits into the specific low power hardware requirements for cold storage and its unique workload characteristics. We will discuss multiple aspects of the software stack including methods to practically keep storage very durable, highly efficient, and handling realistic operations such as handling incremental cluster growth and tolerating a myriad of hardware failures at scale.

DataXu: Scaling to over 1,000,000 requests per second

Featuring: Beth Logan, Senior Director of Optimization at DataXu

Description: DataXu’s decisioning technology handles over 1,000,000 ad requests per second. To put this into context, Google Search handles 5,000 to 10,000 transactions per second and Twitter handles 5,000 to 7,000 transactions per second. Behind this statistic is an incredible architecture that has enabled us to scale. We use a blend of open source and home grown tools to place ads, record their impact and learn and deploy our decisioning models automatically, all while running 24×7 in over 30 countries worldwide. In this presentation we will dive into some of these tools and discuss the challenges we faced and the tradeoffs we made.

Vertica: Data Movement for Distributed Execution

Featuring: Derrick Rice, HP Vertica

Oracle: Query Evaluation Using Dynamic Code Generation

Featuring: Magnus Bjornsson, Senior Director of Engineering at Oracle

Description: Typical query evaluation in a database uses a static evaluator which is built to handle all types of queries. For performance reasons it has become more and more common in recent years to dynamically build the evaluator based on the query itself (using JIT compilation). In this talk I’ll talk about the approach that we at Oracle/Endeca took in our own columnar, in-memory data store to dynamically generate the query evaluator at query time.

Thank you

Featuring: Ryan Mack, Boston Site Lead at Facebook

In an effort to be more inclusive in our language, we have edited this post to replace the terms “master” and “slave” with “primary” and “secondary”.