Maintaining real-time insight into the current state of your infrastructure is important. At Facebook, we’ve been working on a framework called osquery which attempts to approach the concept of low-level operating system monitoring a little differently.
Osquery exposes an operating system as a high-performance relational database. This design allows you to write SQL-based queries efficiently and easily to explore operating systems. With osquery, SQL tables represent the current state of operating system attributes, such as:
- running processes
- loaded kernel modules
- open network connections
SQL tables are implemented via an easily extendable API. Several tables already exist and more are being written. To best understand the expressiveness that is afforded to you by osquery, consider the following examples.
Example queries
This first example illustrates how you might use osquery to interact with the processes that are running on the current system. Specifically, this query returns all of the processes which are currently executing. The where clause of the query only returns processes where the original binary used to launch the process no longer exists on the filesystem. This is a common tactic used by malicious actors, so this should not return any results on your system, assuming your system isn’t compromised.
SELECT name, path, pid FROM processes WHERE on_disk = 0;
Interacting with operating system state via SQL is fun and easy. One of the aspects of SQL that makes it so applicable to operating system analytics is the ability to join different tables together. Consider the following example, which uses data from both the “listening_ports” table and the “processes” table. This query finds all processes that are listening on network ports. Then, using the processes table from the last example, we can join the two tables together since they both expose the pid of the processes in question. This allows you to use generic tables to add context as you explore operating system state.
SELECT DISTINCT process.name, listening.port, listening.address, process.pid FROM processes AS process JOIN listening_ports AS listening ON process.pid = listening.pid;
There are many tables included with osquery and we’re creating more every day. Tables are easy to write, so we often encourage new contributors to develop a few tables as an introduction to the osquery codebase. For detailed documentation on how to create a table, see the guide on the wiki.
Features
Osquery is a framework we’ve used to create a few products and tools. Osquery’s modular codebase allows us to take advantage of existing concepts in new and interesting ways. We’re releasing several tools as a part of the open source release and we have more planned. We’re also looking forward to seeing how the community uses the codebase to create even more interesting tools.
Interactive query console
The interactive query console, osqueryi, gives you an SQL interface to try out new queries and explore your operating system. With the power of SQL and dozens of useful tables built-in, osqueryi is an invaluable tool when diagnosing a systems operations problem, troubleshooting a performance issue, etc.
For more information on how to use osqueryi, see the usage guide on the wiki.
Large-scale host monitoring
The high-performance host monitoring daemon, osqueryd, allows you to schedule queries for execution across your infrastructure. The daemon takes care of aggregating the query results over time, and generates logs which indicate state changes in your infrastructure. You can use this to maintain insight into the security, performance, configuration and state of your entire infrastructure. Osqueryd’s logging can integrate into your existing internal log aggregation pipeline, regardless of your technology stack, via a robust plugin architecture.
If you’re interested in using osqueryd in your infrastructure, see the usage guide on the wiki as well as the internal deployment guide.
Cross platform
Osquery is cross platform. Even though osquery takes advantage of low-level operating system APIs, you can build and use osquery on Ubuntu, CentOS and Mac OSX. This has the distinct advantage of allowing you to monitor your corporate Mac OS X clients the same way you monitor your production Linux servers.
Native packages and extensive documentation
To make deployment as easy as possible, osquery comes with native packages for all supported operating systems. There’s extensive tooling and documentation around creating packages, so packaging and deploying your custom osquery tools can be just as easy, too.
To assist with the rollout process, the osquery wiki has detailed documentation on internal deployment. Osquery was built so that every environment specific aspect of the toolchain can be hot-swapped at run-time with custom plugins. Use these interfaces to deeply integrate osquery into your infrastructure if one of the several existing plugins don’t suit your needs.
You can find out more in the osquery wiki.
Modular codebase
Osquery’s codebase is made up of high-performance, modular components with documented public APIs. These components can be easily strung together to create new, interesting applications and tools. For information on the public API, see the wiki.
Open source
After talking with several external companies, it became clear to us that maintaining insight into the low-level behavior of operating systems is not a problem which is unique to Facebook. Over the past few months, we have released the osquery code and binaries to a small number of external companies. They have successfully deployed and tested osquery within their environments and they’ve given us great feedback.
We’re excited to announce that we’re open sourcing osquery today. You can check out the code and documentation on GitHub.
We’re looking forward to interacting with the community on future features. We do all of our work on osquery via GitHub, which makes working with external contributors a breeze. We hope you’ll see the potential in osquery and will build something amazing with us.
The osquery team within Facebook consists of Mike Arpaia, Ted Reed, Mimeframe and Javier Marcos de Prado.