Open-sourcing SPARTA to make abstract interpretation easy

Abstract interpretation is a powerful static analysis framework we can use to answer questions such as:

What range of values does this variable take?
What are all the types of the values that can flow into this function?
Is this piece of code ever executed, or is it dead?

Using abstract interpretation to build a scalable tool from scratch is a daunting engineering task that generally requires a protracted development effort led by an expert. To streamline that process, we built SPARTA, a C++ library of software components for building high-performance static analyzers that can run in a production environment. SPARTA provides the building blocks (a set of components that have a simple API, are highly performant, and can be easily assembled) so an engineer can focus solely on the logic that extracts the desired information from the program. The library ensures that the resulting analysis is sound, efficient, and scalable. For example, switching ReDex‘s register allocator from a conventional data flow analysis framework to SPARTA resulted in a 60 percent speedup and a 90 percent reduction in memory usage when performing register allocation on large apps.

HOW IT WORKS

SPARTA provides important data structures used in virtually every static analyzer (e.g., finite lattices, power set domains, and abstract environments). By encapsulating the complex implementation details of abstract interpretation, SPARTA frees the developer to focus on the three fundamental axes in the design of an analysis:

The program properties to analyze (range of numerical variables, aliasing relation)
The granularity of the properties analyzed (intraprocedural, interprocedural, context-sensitive)
The representation of the properties (sign, interval, alias graph)

All static analyses used in the ReDex code optimizer for Android are based on SPARTA’s library and run on industrial-scale Android apps from Facebook and other large companies that have adopted ReDex. SPARTA makes it possible for these analyses to be expressed concisely while guaranteeing a high level of performance. For example, ReDex’s liveness analysis — for determining whether and when each variable is used — can be done in fewer than 60 lines of code, which is about the same size as a simple data flow–based liveness implementation. However, the resulting analyzer easily scales to methods containing tens of thousands of blocks and thousands of variables, thanks to SPARTA’s high-performance algorithms and data structures.

For example, since liveness analysis requires recording the state of the program at every basic block, a naive implementation easily leads to a quadratic blowup. However, many basic blocks have program states in which many variables have the same values. To take advantage of this property, SPARTA allows us to model program state using a variant of a Patricia Tree, which needs to allocate memory for only the incremental differences between states.

In addition to providing performance benefits, SPARTA also makes it easier to write correct analyses. For instance, it requires the analysis writer to explicitly describe which abstract properties are more general than others. (Claiming that a variable can take any value between zero and INT_MAX is more general than saying it can take only value 1.) It does sanity checks to ensure that this description makes sense — for example, if A is more general than B, and B more general than C, C cannot be more general than A. In other words, the abstract domains must be partially ordered.

We also note that the algorithms and data structures implemented in SPARTA are highly generic. SPARTA is language-independent; despite its use in ReDex, it contains no Android-specific logic. In addition, its algorithms generalize across different scopes of static analyses: For example, its fixpoint iterator — the main engine that “solves” a static analysis — operates on the general concept of a graph, meaning that it can be applied to both a control-flow graph (as in the liveness analysis example above) and a whole-program call graph. As such, it can also be used to drive interprocedural analyses, such as the interprocedural constant propagation in ReDex.

This library is part of a broad, long-term vision to expand the use of abstract interpretation in our tools at Facebook. We are open-sourcing SPARTA to provide an industrial-scale abstract interpretation infrastructure for the broader static analysis community to use. We invite the community to contribute to SPARTA by implementing more abstractions, algorithms, and data structures. In the future, we could consider adding support for other features, including numerical abstractions or points-to analysis.