Finding and fixing software bugs automatically with SapFix and Sapienz

Debugging code is drudgery. But SapFix, a new AI hybrid tool created by Facebook engineers, can significantly reduce the amount of time engineers spend on debugging, while also speeding up the process of rolling out new software. SapFix can automatically generate fixes for specific bugs, and then propose them to engineers for approval and deployment to production.

SapFix has been used to accelerate the process of shipping robust, stable code updates to millions of devices using the Facebook Android app — the first such use of AI-powered testing and debugging tools in production at this scale. We intend to share SapFix with the engineering community, as it is the next step in the evolution of automating debugging, with the potential to boost the production and stability of new code for a wide range of companies and research organizations.

SapFix is designed to operate as an independent tool, able to run either with or without Sapienz, Facebook’s intelligent automated software testing tool, which was announced at F8 and has already been deployed to production. In its current, proof-of-concept state, SapFix is focused on fixing bugs found by Sapienz before they reach production. The process starts with Sapienz, along with Facebook’s Infer static analysis tool, helping localize the point in the code to patch. Once Sapienz and Infer pinpoint a specific portion of code associated with a crash, it can pass that information to SapFix, which automatically picks from a few strategies to generate a patch.

How SapFix approaches debugging

This graphic illustrates how SapFix generates patches for software bugs.

To address high-firing bugs, SapFix creates patches that either fully or partially revert the code submission that introduced them. For more complex crashes, the system generates patches by drawing from its collection of templated fixes. These templates were automatically harvested from those created by human engineers, based on a pool of past fixes.

When previously used human-designed templates don’t fit, SapFix will attempt a mutation-based fix, whereby it performs small code modifications to the abstract syntax tree (AST) of the crash-causing statement, making adjustments to the patch until a potential solution is found.

Autonomous validation and human approval

Once it lands on a specific patch, SapFix’s work is far from over. The tool generates multiple potential fixes per bug and then evaluates their quality by checking for three issues: Are there compilation errors, does the crash persist, and does the fix introduce new crashes?

To resolve the latter two questions, SapFix runs existing, developer-written tests on the patched builds, as well as tests created by Sapienz. And as with the previous, patch-generation step, this validation process happens autonomously and is isolated from the larger codebase. SapFix is replicating the kind of debugging work that people currently do, but it is not designed to deploy fixes to production code on its own.

When its patches are fully tested, SapFix sends them to a human reviewer for approval. This is very similar to how human-generated reports are checked and approved by other developers, except that the system automatically tracks reviewers’ feedback, lands accepted patches, and then cleans up the other patches. In some cases, SapFix may pick the best fix out of several options and present its recommendations to engineers.

This workflow illustrates how SapFix seeks the engineer’s feedback on the fix it generates.

It then abandons the patch if rejected and lands it if accepted. So, as powerful as its underlying tech may be, and as much time and effort it saves by operating autonomously, SapFix can’t implement its own proposed fixes. Engineers are always in the loop, and this tool relies on their expertise to confirm whether a proposed fix should be deployed.

Since SapFix is still in development, it isn’t being used at the same scale as Sapienz, which now produces hundreds of monthly bug reports pinpointing the exact lines at fault, as it vets code related to the Facebook, Instagram, Workplace, and Messenger apps for Android. Approximately three quarters of Sapienz reports resulted in fixes by developers. But since we started testing SapFix in August, the tool has successfully generated patches that have been accepted by human reviewers and pushed to production.

Paving the way for fully automated debugging

To our knowledge, this marks the first time that a machine-generated fix — with automated end-to-end testing and repair — has been deployed into a codebase of Facebook’s scale. It’s an important milestone for AI hybrids and offers further evidence that search-based software engineering can reduce friction in software development. As we develop SapFix to work with different kinds of bugs and software, the tool has the potential to change the speed and quality of code generation. That’s true not just for companies that operate at large scales, but also for nearly anyone who creates code. Whether used together or separately, SapFix and Sapienz let developers spend less time on debugging and more on generating what’s next.

But with this work we also want to encourage ongoing research into automated fixing and improvement of code. There has been great excitement in the scientific literature about this area, with empirical studies of techniques, enticing sets of open problems and challenges for the scientific community to tackle, and surveys of recent results on automatically improving code. As the first tool of its kind deployed at this scale, SapFix will provide renewed impetus and energy for this exciting but challenging research agenda.

Sapienz and now SapFix are both intended for open source release in the future, once additional engineering work is completed, and the feedback we receive for these tools will help us — and the wider AI community — improve the collective task of automating the finding and fixing of code bugs. And while we’re currently focusing on how SapFix can automatically head off crashes before they happen, the longer-term applications could include making software faster and more responsive. These systems offer significant baseline benefits, and their impact promises to be as varied and wide-ranging as the developers who will use them.

We’d like to thank the following engineers and acknowledge their contributions to SapFix: Alex Marginean, Johannes Bader, Satish Chandra, Alexander Mols, and Andrew Scott.