The state of the research
Diff Risk Score (DRS) is an AI-powered technology built at Meta that predicts the likelihood of a code change causing a production incident, also known as a SEV. Built on a fine-tuned Llama LLM, DRS evaluates code changes and metadata to produce a risk score and highlight potentially risky code snippets. Today, DRS powers many risk-aware features that optimize product quality, developer productivity, and computational capacity efficiency. Notably, DRS has helped us eliminate major code freezes, letting developers ship code when they historically could not with minimal impact to customer experience and the business.
Why it matters
Software development is fraught with risk, especially for intricate, rapidly evolving, and scaled products and technologies. Because Meta operates at a global scale, we need the best tools possible to mitigate risk and to protect both user experience and advertiser outcomes.
AI is transforming how we build products, so we committed ourselves to applying AI to improve every aspect of the software development process. Production risk was one of the areas we tackled first. We theorized that, if equipped with a model that could predict if a code change might cause a SEV, we could build features and workflows to improve almost every aspect of writing and pushing code.
Since DRS use cases are too numerous to cover in depth here, we’ll focus on one: code unfreeze. For Meta, production incidents can drive significant negative user experience and advertiser impact. For this reason, some teams have historically “frozen” major parts of the codebase for sensitive periods like Cyber 5 holiday shopping week, preventing engineers from shipping code to reduce incident risk. For certain teams, it has cut down their holiday shopping code freeze, leading to significant improvements in productivity.
While this had clear reliability benefits, the tradeoff was a substantial reduction in productivity. DRS enabled a more nuanced approach, letting developers land lower-risk changes during these periods while minimizing production incidents, thus protecting the user experience, the business, and productivity. In fact, DRS has driven meaningful productivity gains across many sensitive periods. During one such period, a major partner event in 2024, we landed 10,000+ code changes (that previously could not have landed during a freeze) with minimal production impact, enabling continued innovation and customer success. What’s more, by managing productivity and risk in this way, we benefit twice: through more code landed and through less engineering time spent detecting, understanding, and mitigating production incidents.
Code unfreeze works well, but it’s just the start of what the technology can do. Understanding risk, even imperfectly and at a statistical level, has driven improvements for Meta in more ways than we anticipated – there are 19 use cases for risk tooling and growing!
Where we’re headed next
The success of DRS has spurred the creation of new risk-aware features across Meta that span the entire development lifecycle, from planning to post-release monitoring. The demand to build such features also led us to build the Risk Awareness Platform to provide risk analysis APIs and tool integrations.
We envision four major directions for risk awareness in the coming months and years.
First, while we’ve seen an explosion of DRS-powered features on the Risk Awareness Platform, from optimizing build and test selection to improving reliability, selecting code reviewers, and analyzing release risks, we believe this is only the beginning. A critical problem in software engineering is maximizing innovation rate subject to a reliability threshold, so the applications of risk understanding are virtually inexhaustible. We believe code risk can play a significant role in improving this tradeoff, so we will build more risk-aware features while improving their quality. As the risk model, feature data, and user experiences improve, we’ll see greater real-world benefits for people who use Meta’s products and businesses who advertise with Meta.
Second, we will expand beyond code change risk to configuration change risk. While code changes cause the plurality of SEVs at Meta, configuration changes are another large category. For this reason, we’ve expanded the RAP to include models that predict the risk of various config changes. These efforts are state of the art, focused on an open research area, and earlier on the research-to-production continuum, but we believe they will soon power feature families of their own, much like DRS does today.
Third, we want to automate the risk mitigation step. Instead of flagging risky diffs and recommending appropriate reviewers or rollback mechanisms, we want to use AI agents to proactively generate risk-mitigating changes. This can be done for code in motion (i.e. diffs or pull requests) and for code at rest to lower baseline codebase risk. Additionally, once we are armed with a greater understanding of configuration risks, these agents will be able to operate flexibly across both code and config changes.
Fourth, we will increasingly use natural language outputs to show humans what these risk-aware technologies are doing and why. By helping engineers understand the rationale behind the risk score, we’ll empower them to either mitigate risks or give the model feedback to improve accuracy. This creates a learning loop for improving both our risk models and the end user experience. LLM explainability remains an open area of research, but our teams are actively working to offer answers to common questions.
We are excited for the future of risk-aware software development, and we look forward to learning from—and with—our colleagues in industry as we make progress in this valuable domain.
Read the papers
“Moving Faster and Reducing Risk: Using LLMs in Release Deployment“
“Leveraging Risk Models to Improve Productivity for Effective Code Un-Freeze at Scale”
Acknowledgements
We would like to thank all the team members and the leadership that contributed to making the DRS effort successful at Meta. Rui Abreu, David Amsallem, Parveen Bansal, Kaavya Chinniah, Brian Ellis, James Everingham, Peng Fan, Ford Garberson, Jun Ge, Kelly Hirano, Kosay Jabre, David Khavari, Sahil Kumar, Ajay Lingapuram, Yalin Liu, Audris Mockus, Megh Mehta, Vijayaraghavan Murali, Venus Montes, Aishwarya Girish Paraspatki, Akshay Patel, Brandon Reznicek, Peter C Rigby, Maher Saba, Babak Shakibi, Roy Shen, Gursharan Singh, Matt Steiner, Weiyan Sun, Ryan Tracy, Siri Uppalapati, and Nachiappan Nagappan.