Here at Facebook, we’re constantly facing scaling challanges because of our enormous growth. One particular problem we encountered a couple of years ago was collection of data from our servers. We were collecting a few billion messages a day (which seemed like a lot at the time) for everything from access logs to performance statistics to actions that went to News Feed. We used a variety of different technologies for the different use cases, and all of them were bursting at the seams. We decided to build a unified system (called Scribe) to handle all of these cases, and do it in a way that would scale with Facebook’s growth.
The system we built turned out to be enormously useful, handling over 100 use cases and tens of billions of messages a day. It has also been battle tested by just about anything that can go wrong, so I encourage you to take a look at the newly opened Scribe source and see if it might be useful for you. To give the code some context, I’m going to go through the major design decisions we made to allow the system to scale. The first decision we made was to not lock ourselves into a particular network topology. The Scribe servers are arranged in a directed graph, but each server only knows about the next server in the graph. This flexible topology allows for things like adding an extra layer of fan-in if the system grows too large, and batching messages before sending them between datacenters, but without having any code that explicitly needs to understand datacenter topology, only a simple configuration.
The second major design decision was about reliability. We chose was a middle ground here, reliable enough that we can expect to get all of the data almost all of the time, but not reliable enough to require heavyweight protocols and disk usage. More specifically, Scribe spools data to disk on any node to handle intermittent connectivity node failure, but it doesn’t sync a log file for every message, so there’s a possibility of a small amount of data loss in the event of a crash or catastrophic hardware failure. Basically, this is more reliability than you get with most logging systems, but not something you should use for database transactions.
As it turned out, this is a reasonable level of reliability for a lot of use cases, and has made scaling much easier. It’s also the source of a lot of the hard-learned lessons: getting the system to catch up seamlessly after a significant network problem is tricky, especially when there are tens or hundreds of gigabytes of data backed up. The final design decision was about the data model. When you’re building something that looks like a logging system there are a lot of things people expect: logging levels and rules about when they get sent, timestamping and ordering of messages, schemas for common messages, etc.
We decided that this was a can of worms that shouldn’t be mixed up with the asynchronous and mostly reliable delivery of data, so we made the data model very simple. A message is two strings: a category and the actual message. The category is the description of what the message is about, and the expectation is that messages of the same category end up in the same place. The message is the actual data to be logged. We also don’t have any a priori list of categories that must be maintained. If you create a new category it shows up at a new file. This is following the Unix philosophy of doing exactly one thing and doing it well, and it has definitely paid off in ease of use and development. We started with four or five use cases in mind and now we have hundreds, but we didn’t have to modify the Scribe source for any of them.
Another choice we made early on was to build Scribe using Thrift. This sped up development enormously because a lot of the hard parts were already taken care of, and it also made the resulting system much more flexible. We currently log messages to Scribe from PHP, python, C++, and Java code, and the list of possible languages is growing all the time from the contributions of developers around the world. So Scribe has already benefitted enormously from Thrift being open, and it will be even better having Scribe open too. I hope you find it as useful as we do.