Facebook was an early adopter of Hadoop and Hive for our data needs. As our data volume grew and the pace of our product cycle increased, we needed to analyze data faster — so we created Presto. Presto is our distributed SQL engine for running interactive analytic queries against data sources of all sizes, ranging from gigabytes to petabytes. Presto approaches the speed of commercial data warehouses while scaling to the size of our organization.
When we launched Presto, we saw dramatic query performance improvement across multiple internal Hadoop clusters. In fact, lots of Facebook employees now use Presto daily, running many thousands of queries. We open-sourced Presto in 2013 and have invested heavily to build up the community.
How Airbnb, Dropbox, Netflix, and NASDAQ use Presto
After making Presto available to others, we’ve seen a lot of growth, adoption, and support for it. Data-intensive companies like Airbnb, Dropbox, and Netflix use Presto as their interactive querying engine. On average, Netflix runs around 3,500 queries per day on its Presto clusters. NASDAQ and Netflix recently discussed their Presto usage on Amazon EMR at re:Invent Talk.
MicroStrategy recently announced that it is supporting Presto in its MicroStrategy 10 release, and Teradata has joined the community to contribute to Presto as well.
Demonstrating the benefits of the collaborative open source community, Airbnb built a tool — Airpal — on top of the Presto engine to give more of its employees access to data they need. When running ad hoc queries and iterating on the steps of an analysis, Airbnb found Presto to be much snappier and more responsive than traditional map-reduce jobs. Airbnb shared in a blog post, however, that the biggest benefit it saw by incorporating Presto into its infrastructure stack was that the company didn’t have to add additional complexity to allow “interactive” querying. By querying against one central Hive warehouse, Airbnb can maintain a “single source of truth” with no large-scale copies to a separate storage/query layer.
Presto is supported on cloud-based platforms such as Amazon Elastic MapReduce (Amazon EMR), Qubole, and Treasure Data, which help organizations that lack significant in-house technical resources to set up and run Presto quickly. Many customers are already in production.
Expansion beyond Silicon Valley
Presto is no longer our little secret in Silicon Valley. Following its rapid adoption among tech companies, Presto is being deployed by leading companies outside the Valley, as well. Companies outside the United States are using Presto — one example being Gree, a Japanese social media game-development company. According to a recent blog post, Gree found that Presto’s performance was outstanding compared with Hive’s, and that its Java stack and architecture allowed Gree’s engineers to customize it for better integration with the company’s own data center infrastructure and its on-premise Hadoop cluster. Another example is Kanmu, a Japanese startup in the financial services industry that provides card-based offers and uses Amazon EMR with Presto. Also, Chinese e-commerce company JD.com has adopted Presto, as well, and is using it for internal reporting and real-time big data processing for its customers. The company has collaborated with us to translate the Presto website into Chinese so that it’s accessible to a large number of users and developers.
The road ahead
We’re so pleased to see a wide range of companies interested in Presto — companies of varying sizes, maturity, industry, and geography that are energized by the Presto community. In March, there were 65 Presto contributors on GitHub; today, there are 94. We hope this vibrant, open, community-based project will continue to flourish, and that together we can take Presto to the next level.
There’s more coming up for Presto at Facebook — check out these recent MeetUp slides from a Boston event in October. For more on Presto, check out further reading below:
- Presto website
- MicroStrategy press release
- Blog post on open-sourcing Presto (2013)
- Airbnb Airpal blog post
- Netflix Presto blog post
- Gree Presto blog post
- Teradata Presto blog post
- Tutorial for setting up Airpal with Amazon EMR and query data in S3 via Presto