On Thursday June 8th, we welcomed 350 engineers to Seattle for the 3rd consecutive year of Data @Scale.
In addition to focusing on the challenges of building services and solutions for large-scale storage systems and analytics, this year’s speakers from Facebook, Google, LinkedIn, Microsoft, Pinterest, Uber, and Yandex examined the ways in which Big Data is transforming machine learning, even as new machine learning techniques are leading to an evolution in infrastructure, hardware engineering, and data center design.
In advance of the day’s presentations we hosted a Women in Engineering Breakfast & Panel, offering attendees the chance to connect with fellow researchers and industry leaders with a passion for technology and participate in a discussion with panelists from Amazon, Facebook, and Google.
For a recap of the conference and the presentations, check out the videos below. The @Scale community is focused on bringing people together to openly discuss these challenges and collaborate on the development of new solutions. If you’re interested in joining the next event, visit the @Scale website or join the @Scale community.
Accelerating Machine Learning for Computer Vision
Pieter Noordhuis, Facebook
Facebook engineer Pieter Noordhuis shares insights from a newly released paper, “Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour.” The paper demonstrates how creative infrastructure design can contribute to more efficient deep learning at scale.
Next Generation of Globally-Distributed Databases in Azure
Rimma Nehme, Microsoft
Rimma describes the next generation of globally distributed databases at Microsoft. These databases can run on millions of nodes across hundreds of data centers and handle up to trillions of data objects, 24/7 – all backed by industry-leading comprehensive SLAs.
Yandex Clickhouse: A DBMS for Interactive Analytics at Scale
Alexey Milovidov, Yandex
In his session, Alexey walks through the development of ClickHouse and how an iterative approach to data storage organization resulted in a system that can ingest clickstream data in real time, generate interactive reports on non-aggregated data, process 100 billion rows per second on HDDs, scales linearly, supports the SQL language dialect, and is open source.
Evolution of Storage and Serving at Pinterest
Yongsheng Wu, Pinterest
Yongsheng covers the evolution of storage and serving at scale as Pinterest grows. He shares insight into building a machine learning serving platform to address new challenges on how to efficiently serve feeds with complicated machine-learned ranking models and features scattered across many data sets with very low latency to deliver delightful experiences.
Cadence: Micro-service Architecture Beyond Request/Reply
Maxim Fateev, Uber
Uber’s Maxim Fateev offers a technical review of Cadence, an open source solution for building and running micro-services that expose asynchronous, long-running operations in a scalable and resilient way. Cadence borrows ideas from the AWS Simple Workflow service, is written in Go, and relies on Cassandra for storage.
How Reporting and Experimentation Fuel Product Innovation at LinkedIn
Kapil Surlaker, LinkedIn
Kapil describes UMP and XLNT, platforms built for metrics computation and experimentation, respectively. Over the last few years, these platforms have allowed LinkedIn to perform measurement and experimentation efficiently at scale while preserving trust in data.
Spanner’s SQL Evolution
Sergey Melnik, Google
Sergey offers a look at the technical challenges behind Spanner, a globally distributed data management system that backs hundreds of mission-critical services at Google.
Spanner is built on ideas from both the systems and database communities. Initially, Spanner focused on the systems aspects such as scalability, automatic sharding, fault tolerance, consistent replication, external consistency, and wide-area distribution. More recently, Google has been working on turning Spanner into a SQL DBMS.
Sergey describes distributed query execution in the presence of resharding, query restarts upon transient failures, range extraction that drives query routing and index seeks, and the improved blockwise-columnar storage format. He touches upon migrating Spanner to the common SQL dialect shared with other systems at Google.
Architectures for the New Era of Cloud Specialization
Doug Burger, Microsoft
Doug Burger walks the audience through some of Microsoft’s efforts around large-scale deployments of programmable hardware in the Microsoft cloud, including both the hardware and the resource management interfaces. He does so within the context of the incipient end of Moore’s law and the ever-increasing computational needs, driven in part by big data and machine learning, that will force systems to become more heterogeneous.
Bulk Data Movement Serving Facebook’s Global Data Storage and Processing
Steve Stroiney, Facebook
Steve closes the conference with a talk describing Facebook’s system for bulk data movement across storage systems worldwide.