Last year we began speaking at conferences around the world about our approach to managing hundreds of thousands of servers. We had outgrown our existing system and needed something new. We wanted a system that would let any engineer make any change they needed to any systems they owned via simple data-driven APIs while also scaling to Facebook’s huge infrastructure, and while also minimizing the size of the team that would have to own the system. We designed a new paradigm and built a framework to bring it to life. At the core of that framework is Chef — but the way we ended up using Chef is pretty unique. We wanted to share how and why we made those choices and the benefits they brought us.
People have been amazingly supportive of the tools we released, the ideas we presented, and the changes we proposed. We released some tools we thought were useful to the community, but we didn’t consider releasing our cookbooks because we believed they were too Facebook-specific. But a central theme in our talks was our cookbook design, and people started asking for them.
So recently we revisited that assumption about our cookbooks. In looking at how we built them, we realized that we had developed a different way of writing community-style cookbooks that succeeded — at least within Facebook — at doing what community cookbooks had failed to do outside of Facebook. And we started to wonder: Could this approach be applied not just to other organizations wanting to leverage our model, but also to writing community cookbooks the entire world can use?
We think that it might, so today we want to share with you three things:
1) A model we believe would build better community cookbooks
2) Some docs and example cookbooks we’ve open-sourced
3) Several tools we’ve just open-sourced
The Facebook Chef Team will also be at the Chef Community Summit in Seattle October 2-3, so if any of this interests you, please come talk to us there.
Building better community cookbooks
Those of you who have used community cookbooks know that they tend to fall into one of two categories: They don’t offer enough control, or they are difficult or cumbersome to use.
We think this is due to a pattern common to most community cookbooks. They define a separate element for each thing you can set. So if the cookbook has, say, three settings for a configuration file, that’s very limiting. But if many settings are available it is hard to know which of the perhaps hundreds of settings need your attention and it can end up more complicated than the configuration itself.
We believe there’s a way to do something more generic that provides more flexibility with less complication.
Let’s assume you’re writing a cookbook for some package which has a single config file with 100 different options. The file is a simple format that just has keys and values separated by a space like this:
Port 1234 Address 127.0.0.0 LogFile /var/log/foo.log SSL On SSLKey /etc/ssl/mysite.key SSLCert /etc/ssl/mysite.crt SSLCACert /etc/ssl/myca.cert
And so on. You don’t need to define them all — the software is distributed with a sample config file with everything commented out. When you want to change something, you typically would uncomment it, change the value, and restart the daemon.
A typical cookbook might define a template for this file that looks like:
Port 1234 Address 127.0.0.1 LogFile <%= node['foo']['logfile'] %> LogSize <%= node['foo']['logsize'] %> <% if node['foo']['ssl'] %> SSL On SSLKey <%= node['foo']['sslkey'] %> SSLCert <%= node['foo']['sslcert'] %> SSLCACert <%= node['foo']['sslcacert'] %> <% end %>
But this format has many drawbacks: (1) The template is heavily tied into the exact keys the config file accepts. Future changes require changes to the template to be aware of the different formats of the file. (2) We’ve already added six variables, some of which are booleans and some of which are strings. (3) There are tons of configuration options we don’t support, and adding any others requires changing the template and increasing the complexity of using the cookbook.
What if we took a different approach? What if that template looked like this:
<% node['foo']['config_file'].to_hash.each |key, val| %> <%= key %> <%= val %> <% end %>
Now we support every single config entry that can exist, used to exist, or will exist in this config file, unless they change the syntax. If the next version adds six more configuration parameters, our cookbook doesn’t have to change. The API is very simple for users to understand — if you can put it in the config file, you can put it in the config_file hash. It’s flexible, easy to document, easy to maintain, and simple enough to be used by anyone.
This works for even relatively complicated config files. Let’s look at a more interesting example: my.cnf — the MySQL config file — an INI-format file. It has section headers in square brackets, and those sections are composed of key-value pairs. An example config:
[mysqld] user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 bind-address = 127.0.0.1 [client] port = 3306 socket = /var/run/mysqld/mysqld.sock
You get the idea. It turns out some lines are only a key with no value, such as:
[mysqldump] quick
A template for this might look like:
<% node['mysql']['my.cnf'].to_hash.each |section, configs| %> [<%= section %>] <% configs.each do |key, val| %> <% if val.is_a?(FalseClass) %> <% next %> <% elsif val.is_a?(TrueClass) %> <%= key %> <% else %> <%= key %> = <%= val %> <% end %> <% end %>
Now your API is something like:
node.default['mysql']['my.cnf']['mysqld']['user'] = 'mysql'
And that generates:
[mysqld] user = mysql
Well, one assumes that your attributes file fills in a set of standard defaults, so your file would actually contain more than that — but the user line under [mysqld] will certainly have the value mysql.
Sometimes a config can seem very difficult to support with this method, but there’s usually a very easy way. Perhaps you have a JSON-based config file? That’s even easier:
<%= JSON.pretty_generate(node['foo']['config_file'].to_hash) %>
By abstracting a config-file into a data structure such as a hash or an array, you provide a simple, extensible API.
Let’s make something even more complicated. Instead of being the INI-style my.cnf, let’s look at Apache VirtualHosts. Since VirtualHosts can support a variety of syntaxes based on what modules are available, we’re going to simplify things a bit. Most options in an Apache VirtualHost take a key and either a single value or multiple space-separated values, and that’s what we’re going to support. Further, we’ll allow sub-sections (such as “Directory” and “Limit”) in our VirtualHosts that themselves follow the format of a VirtualHost. This makes for a reasonably complicated example we can still keep short.
An example config file might look like:
<VirtualHost *:80> ServerName www.sample.com ServerAlias sample.com ErrorLog /var/log/apache/sample.com-error.log CustomLog /var/log/apache/sample.com-error.log combined DocumentRoot /var/www/sample.com <Directory /var/www/sample.com/> Options IncludesNoExec FollowSymLinks MultiViews Allow from all </Directory> </VirtualHost>
Let’s build a template!
<% node['apache']['virtual_hosts'].to_hash.each do |name, config| %> <VirtualHost <%= name %>> <% config.each do |key, val| <% if val.is_a?(Hash) %> <<%= key %>> <% val.each do |subkey, subval| %> <%= subkey %> <%= subval.is_a?(Array) ? subval.join(' ') : subval %> <% end %> </<%= key %>> <% else %> <%= key %> <%= val.is_a?(Array) ? val.join(' ') : val %> <% end %> <% end %> </VirtualHost> <% end %>
Note that in Apache you can have a further sub-section under Directory that this doesn’t support. We could add one more nested loop to handle that. Also note that this only generates VirtualHosts, and obviously there are other parts to an Apache config.
This template supports anything that follows this pattern, whether it’s RewriteRules, ScriptAliases, AuthName, or anything else.
Validation
Let’s say you want to do some validation. Let’s be clear — it’s not Chef’s job to ensure the config is fully valid. For example, don’t get stuck reimplementing Apache’s configtest inside Chef. But you may want to check for some obvious badness or required keys. Or you may want to fill in defaults you could not ahead of time (inside user-created hashes, for example).
Continuing with the VirtualHost example from above, let’s say we want to make sure everyone has a ServerAdmin entry in any VirtualHosts. They can fill their own in, but if they don’t, we want them to get a default of webmaster@<theirdomain>. We can’t provide defaults in the attributes file since we don’t know what VirtualHosts people will define. So we might do this in our recipe:
whyrun_safe_ruby_block 'validate foo config' do block do node['apache']['virtual_hosts'].to_hash.each do |name, config| # Everyone has to have a ServerName fail "#{name} missing ServerName" unless config['ServerName'] # You also have to have a ServerAdmin, but we can # build it for you. node.default['apache']['virtual_hosts'][name]['ServerAdmin'] = "webmaster@#{config['ServerName']}" unless config['ServerAdmin'] end end end template '/etc/apache/virtualhosts.conf' do owner 'root' group 'root' mode '0644' end
Now, no matter what VirtualHosts are defined, we can provide a sane default for ServerAdmin.
So why do that validation in a whyrun_safe_ruby_block instead of the template? Templates provide poor error messages and are notoriously difficult to debug, so we attempt to keep them as simple as possible.
Benefits
This approach has several benefits that we’ve seen at Facebook and that we believe the community can harness as well:
1) Cookbooks that allow users to change anything they want to change while still providing simple, easy-to-understand, easy-to-document APIs.
2) Cookbooks that require as little ongoing maintenance as possible.
3) Cookbooks that always cleanup unwanted configuration by managing the entire configuration for some service/package/software as a single idempotent system rather than trying to manage each line in a file.
Our Cookbooks
We’ve released two of our cookbooks that provide simple examples of these sorts of APIs, and we hope to release more in the future. In addition, we’ve created a document that tries to encapsulate the ideas behind how and why we run Chef the way we do. Both of these are hosted on GitHub.
Our Tools
The first Chef-related tool we open-sourced (almost two years ago) was Grocery Delivery, a system for keeping Chef servers in sync with a source-code repo such as git or svn. GD is what allowed us to build out the massive Chef infrastructure we did and still ensure we could treat our many Chef servers as ephemeral machines and be able to lose or replace them at any time. We recently re-wrote that in Ruby, made it available as a RubyGem and on GitHub. Like the original, it’s configurable and pluggable for maximum flexibility.
We also rewrote our testing system. Originally called chef_test, this was one of the things people were most excited to see from our talks. Unfortunately, the original version had far too many Facebook assumptions to share, but the rewrite (also in Ruby) is better, faster, generic, and based on Chef Zero. Like GD, it’s configurable and pluggable. Now called Taste Tester, you can find it as a you can find it as a RubyGem and on GitHub.
Summary
We’re really excited to share not only our tools but also our ideas and practices with you all. We hope this can help people to leverage what we’ve learned as well as start some new discussions we can also learn from.