Meta is developing new privacy-enhancing technologies (PETs) to innovate and solve problems with less data. These technologies enable teams to build and launch privacy-enhanced products in a way that’s verifiable and safeguards user data. Using state-of-the-art cryptographic techniques, we have developed Private Data Lookup (PDL) that allows users to privately query a server-side data set. PDL is based on a secure multiparty computation mechanism called Private Set Intersection, where two parties holding sets can compute the intersection of the two sets without revealing their sets to the counterpart. With PDL, we further ensure that only one party (i.e., Meta users) can see the result, disabling Meta from learning the result of the intersection and thus enhancing the privacy of users’ data.
We use PDL for data minimization and we began with supporting first party passwords in Enterprise Center, Meta’s new platform to enable collaboration between external partners and Meta. With PDL, we encourage the use of stronger passwords while minimizing the information revealed to the server in the password precheck process.
Creating a password is the first step in the authentication cycle for most users. Hence, identifying weak passwords in this step offers a stronger security stance than checking weak passwords while they are already in use. While traditional password guidance includes a list of best practices, good passwords satisfying these requirements can still be leaked through breaches. Thus, proactive checking for compromised passwords complements password strength guidelines and helps users choose strong, secure passwords.
Specifically, PDL supports the breached password check feature in Enterprise Center’s password creation flows, including account creation and password reset. Enterprise Center users now receive an alert if they attempt to use a password that was previously exposed in a data breach collected by third parties (e.g., FlashPoint.io, HoldSecurity.com). Compared with the traditional server-side password hash check that reveals all of the users’ password creation attempts to the server, PDL helps to deliver the alert in a way that preserves privacy, or in other words without revealing to Meta Enterprise Center what passwords were attempted by the user, and whether the password was previously exposed. The goal is to minimize the final information collected by the Enterprise Center to be just the strong password picked by the user.
How PDL supports private password precheck
The challenge of privately checking password entered by a user against a set of passwords known to have been exposed in third party data breaches falls into an area of applied cryptography known as Private Set Intersection. It allows two parties, each holding a set of sensitive data (passwords in this case), to compute the items common to each party’s set without either party revealing the contents of their set to the other party. PDL provides the functionality of Private Set Intersection and its design is inspired by the research paper authored by Thomas et al. One distinction with previous work is we check if the password appears anywhere in the breach, whereas previous solutions alerts the user only when the specific (username, password) pair appears in the breach. We designed our solution this way since it is more relevant for targeted attack scenarios for highly sensitive accounts: for such attacks, the malicious actors are likely to use all passwords in breaches in conjunction with the target’s username. For example, if a strong password associated with a specific username appears in a breach, then all users should also avoid using this password.
In a simplified version of our password precheck workflow over PDL, when making a request, a client calculates the hash H(p) of its password p and then blinds the hash output with a secret key a that is randomly generated for each request. After that, the client sends this blinded hash value, denoted by H(p)^a, to our service.
Upon receiving the request, the password precheck service (“the service”) in the Meta Enterprise Center will first blind the client’s request with a long term secret key b. The resulting value is a double-blinded hash of the original password from the client, denoted by H(p)^ab. Then the server will apply the same hash algorithm and blinding operation with secret key b to all the passwords from the leaked password dataset. This will result in a list of blinded hash values denoted by H(p1)^b, H(p2)^b, …, H(pn)^b. The server sends back the double blinded query and the list of single-blinded hash values.
After receiving the response, the client applies her secret key a to unblind the double blinded hash, resulting in a hash value that is only blinded by the service’s secret key b, i.e., q^b. Now the client is able to match q^b with the list of blinded hash values. If the client’s password p matches a leaked password pi, then there will be a matched blinded hash value because H(q)^b will be equal to H(pi)^b.
In this implementation, the privacy of the user’s data is well protected because the user’s password is one-way hashed and encrypted by the user’s one-time secret key, revealing no information to the service. In addition, the service learns nothing about the matching result because the matching happens entirely locally at the client.
As one may already have noticed, there are several issues in this initial version. First, hashing and blinding each password in the leaked password dataset at runtime cause a lot of latency at the server side. Second, it is impractical with regards to latency and bandwidth usage for the client to download all the blinded hash values of leaked passwords because there can be millions of them.
It was determined that the default implementation would adversely impact user experience, due to the increase in processing time and amount of data that would need to be transferred between the client and server. To address this challenge the following optimization was adopted:
- Pre-processing of compromised password data into blinded hash values. To avoid having to perform expensive cryptographic operations at run time and to increase performance, the compromised password dataset is pre-processed into a format that can be directly replied to the client.
- Sharding the leaked password dataset. Instead of returning blinded hash values for the entire leaked password dataset, we let the client generate a small sharding index from the first couple of bytes of the password hash. The increased leakage and privacy risk is negligible as millions of passwords potentially share the same index and we choose the index size carefully to balance privacy and performance. The index now enables the server to return a smaller subset of the dataset in response to the blinded hash values.
- Compression of the blinded hash values replied by the service. To reduce the bandwidth overhead of the service’s response, we truncate each blinded hash value into a smaller size while preserving its uniqueness for matching.
The user experience
Foundational to Private Password Precheck’s success is the ability to perform the check in a manner that is transparent to users, avoiding any disruption to user experience.
The entire workflow for Private Password Precheck consists of the following steps:
- User enters a new password during account creation or password reset.
- If the password checks through local requirements (e.g. minimum length requirement), it is sent to a client library to go through Private Password Precheck.
- The client library generates a PDL request, sends it to the server and gets the PDL response.
- The client library will perform the local match; if a match is found, the user gets an alert on the page suggesting to use a stronger password.
The following sequence diagram demonstrates the workflow:
Offering more privacy value with PDL
Looking ahead, PDL has several interesting extensions and potential applications to further minimize data collection efforts. Some of these are briefly mentioned below.
- In addition to passwords, PDL can be used to lookup other pieces of information from clients such as user contacts on the service leading to private contact discovery.
- PDL can be applied to systems looking to detect malicious content and downloads within apps without revealing the content to servers.
- PDL can be extended to support key-value lookups.
PDL can also be combined with other Private Enhancing Technologies to optimize the trade-off between privacy and efficiency. For example, PDL can also be used together with Anonymous Credential Service (ACS) to additionally hide the identity of the client which improves privacy and enables more flexibility in designing our shards.