What the research is:
DELF is a new framework to help developers implement data deletion in modern applications. Traditional methods for implementing deletion require application developers to write repetitive, error-prone code. DELF’s main novelty lies in enabling developers to implement deletion in every product they build with minimal effort, which takes the form of annotations rather than code. DELF introduces multiple correctness validation techniques to help achieve semantic correctness and avoid mistakes that could potentially lead to retaining data that should have been deleted or accidentally deleting the wrong data.
We’ve deployed DELF over the past few years at Facebook, where it helps honor deletion requests by people using diverse products and services. DELF helps Facebook developers enforce that the products they build handle deletion early in the development process—before any data can even be stored on the back end. DELF currently supports the execution of billions of deletions every day.
How it works:
DELF builds on a structured data type specification language. Instead of directly calling database APIs to delete data, developers write structured data type specifications that include annotations stating expected behavior when data is deleted. Annotations include descriptions of when the data should be deleted and whether DELF should also cascade and delete related objects. In the data type example below, developers annotate the created_photo edge between an account and its photos with the deep annotation to indicate that when an account gets deleted, all photos that the account created should be deleted as well.
When a person issues a deletion request, DELF—rather than the application—executes the request to completion without requiring further input from developers. It begins by immediately hiding the target object and uses read-time checks to hide any dependent data in the product. It then asynchronously traverses the graph of data to delete, issuing point deletes to the underlying data stores. At the same time, to avoid data loss in scenarios where the wrong data is inadvertently deleted, DELF records a restoration log, which enables engineers to perform an “undo” operation for a limited time period and simplify data recovery efforts. DELF monitors the progress of all deletions, detects any errors that may occur, retries all deletions automatically, and surfaces any persistent errors to developers to fix until all deletions eventually complete.
The figure below presents DELF in operation, executing the deletion of a sample post with comments and comment replies.
To respect the intent of deletions, DELF relies on the correctness of annotations. DELF includes a wide range of annotation correctness checks to detect and resolve mistakes that developers inadvertently introduce, and to ensure that annotations are present and correct. Static validation enforces that all data types have sufficient deletion annotations before any new data is collected. Dynamic validation heuristics inspect production data and surface potentially incorrect annotations for developers to review and fix. Privilege escalation checks help block potentially malicious deletions by finding instances where a deletion affects data that a person should not be able to delete, and data type validation finds object fields that contain implied references to other objects to ensure that developers do not bypass DELF.
Why it matters:
Deletion is an important privacy expectation from the people using our applications and services: It’s a simple concept that is straightforward to invoke, and past research has shown that it’s widely used across online services. People trust services with their data, and when they are no longer confident that they want their data to exist online, they delete it and expect that deletion to take effect quickly and completely.
However, while deletion is a simple concept, it is also complex to execute. The architecture of modern distributed data stores makes it challenging for developers to implement deletion. While these stores generally offer a point deletion API (e.g., delete this row), they offload to applications the work of figuring out when to invoke that API and with which arguments. Without a framework like DELF, well-meaning developers make mistakes. These mistakes can lead to retaining data that should be deleted, to inadvertent deletions, or even to exploitable vulnerabilities that can be used to delete arbitrary data in an application. There is no known, systematic approach for helping well-meaning developers implement deletion.
We hope that the lessons we’ve learned while developing and deploying DELF will help others implement data deletion in their own systems and motivate further research to continue advancing the state of the art in this important area.
Read the full paper:
This is joint work with Georgios Damaskinos, Benjamin Strahs, Daniel Obenshain, Divino Neto, Paul Pearce, Joshi Cordova, and Benoît Reitz. We’d like to acknowledge Ben Mathews and Scott Renfro for bootstrapping DELF, and we thank Leonardo Aoun, Adarsh Koyya, Akin Ilerle, Amitsing Chandele, Andrei Bajenov, Anurag Sharma, Boris Grubic, Gerard Goossen, Cristina Grigoruta, Gustavo Pacianotto Gouveia, Gustavo Pereira De Castro, Huseyin Olgac, Jordan Webster, Mahdy Nasr, Maria Mateescu, Masha Kereb, Merna Rezk, Nikita Efanov, Ohad Almagor, Oleksandr Manzyuk, Prakash Verma, Shradha Budhiraja, Shubhanshu Agrawal, Sneha Padgalwar, Tudor Tiplea, and Vasil Vasilev for their work on DELF.