At Facebook, every change made to our mobile code is checked by our static analyzer Infer. Static code analyzers are tools that scan the source code of a program to detect possible bugs. The main advantage of a static analyzer is its ability to detect bugs without running the program and before the software is shipped to the users.
One benefit of Infer over other static analyzers is that it performs sophisticated interprocedural/interfile analysis at scale. That is, Infer can detect subtle bugs by tracking values through many procedure calls or where the procedures live across different files. Remarkably, although sophisticated and deep, Infer’s analyses scale well to large programs consisting of millions of lines of code.
Despite Infer’s advantages, one of its limitations has been its extensibility. Adding a checker for a new type of bug was a complex task and required a lot of deep static analysis expertise as well as knowledge about Infer’s internals. This was especially painful for writing new checkers that didn’t require all the sophistication that Infer offered. In fact, many significant software bugs are confined to the code of a single procedure (called intraprocedural). To detect these bugs, it may suffice to perform simpler analyses that do not require technical expertise in static analysis. These bugs can often be expressed by referring to the syntax of the program.
For this reason, we have introduced a new language called AL to easily design new checkers for detecting these kind of bugs. Doing that does not require any knowledge of the internal of Infer — AL worries about the internals of Infer for you. AL is a simple, declarative language for reasoning about abstract syntax trees, and it’s easy, quick, and interactive. Writing a new checker can normally be done with few lines of AL code, and when analyzing programs, Infer interprets the checkers without having to be recompiled. The effect is immediate, which helps in the design and tuning of the new check.
AL is open source and ships with Infer. Currently, AL works for C, C++, and Objective-C, and we plan to extend it to other languages in the future.
Writing a new checker
Let’s dive into a concrete example. Suppose we want to write the following checker in Objective-C:
“A property containing the word ‘delegate’ but not containing the word ‘queue’ should not be declared strong.”
Before AL, extending Infer to add a checker that would detect this scenario was a complex undertaking: You needed to hack Infer’s OCaml source code to write a traversal of the AST, write code that recognized the problematic case, add a new error type in Infer, and so on. Most of these tasks require deep knowledge of Infer’s internals.
AL simplifies the process of adding a checker. We just need to write an AL formula that expresses the sentence above and a message to report to the user. We then package that into a checker definition, and that’s it.
The AL checker for the property above is:
DEFINE-CHECKER STRONG_DELEGATE_WARNING = { LET name_contains_delegate = declaration_has_name(REGEXP("[dD]elegate")); LET name_does_not_contain_queue = NOT declaration_has_name(REGEXP("[qQ]ueue")); SET report_when = WHEN name_contains_delegate AND name_does_not_contain_queue AND is_strong_property() HOLDS-IN-NODE ObjCPropertyDecl; SET message = "Property or ivar %decl_name% declared strong"; SET suggestion = "In general delegates should be declared weak or assign"; };
The checker definition starts with the keyword DEFINE-CHECKER followed by the checker’s name. The first LET clause defines the formula variable name_contains_delegate
. It uses the AL’s predicate declaration_has_name
, which returns true/false depending on whether a declaration name contains the word specified in the regular expression. An AL’s predicate is a simple atomic formula evaluated on an AST node. AL provides a library of such predicates that can be used to build formulas, and the library is continuously growing. Formula variables can be used to simplify other definitions.
The SET report_when
is mandatory and defines a formula that, when evaluated to true, will tell Infer to report an error. This is the core formula that defines the checker and should express the error we are trying to detect. In our example, the formula is saying that we should report an error when visiting a node ObjCPropertyDecl
(that is, the AST node declaring a property in Objective-C), where it holds that the name contains “delegate” (name_contains_delegate
) and the name doesn’t contain “queue” (name_does_not_contain_queue
) and the node is defining a “strong” property (is_strong_property()
).
The clause SET message
defines the error message that will be displayed to the user. Notice that the message can include placeholders like %decl_name%
. Placeholders are evaluated by Infer and substituted by their current value when the error message is reported. In this case, it is the name of the declaration. The clause SET suggestion
is optional, and it’s used to give the developer hints on how to fix the bug.
Once defined, the checker should be added to a file that will be interpreted by Infer and checked in the analyzed program.
AL formulas
In AL, we can write much more complex formulas than the one in our example. In particular, AL is based on a mathematical logic called temporal logic.
You can see paths in the AST of the program as a discrete abstraction of time. More precisely, in AL, an AST node is a point in time, and its descendants are possible futures. AL has temporal operators that describe how the future can evolve (and therefore what we can find in the descendant of a node). One such operator is formula HOLDS-EVENTUALLY
expressing the fact that formula
should become true in some descendant of the current node. For example, the formula
call_method("removeObserver:") HOLDS-EVENTUALLY
evaluated in a node n becomes true if at some point in one of the descendants of n there is a node that is a call to the method “removeObserver:”
.
Internally, to check the validity of the formula, the AL engine is powered by what is technically called a model-checking algorithm for temporal logic to reason about syntax trees.
Applications
At Facebook, we have deployed several AL checkers that are run on every mobile code change together with other Infer analyses. Two examples are:
- DIRECT ATOMIC PROPERTY ACCESS: This checks that an ObjC property declared “atomic” should not be accessed directly via its underlying field (called ivar), as that could break atomicity and may give rise to race conditions.
- REGISTERED OBSERVER BEING DEALLOCATED: This checks that an observer object registered in NSNotificationCenter is unregistered before being deallocated. If not, the notification center will keep on sending messages to the deallocated object, making the app crash.
AL has proven to be quite useful for us at Facebook for quickly adding new checks in Infer. To date, AL’s checkers have reported thousands of bugs that our engineers fixed. We are still learning from this experience as we continue to develop AL. We are planning to improve the tools around the language (in particular, debugging tools) to make it even easier to write new checks in the future. Moreover, we are continuously extending AL’s libraries of predicates and macros to make it more expressive.
For Objective-C/C/C++ developers who want to extend Infer with a new checker but don’t have deep knowledge of static analysis technology, AL is a great solution. AL is interpreted by the open source Infer, and it is enough to add new checkers in the linter.al file containing existing linters.