PyTorch-BigGraph: Faster embeddings of large graphs

WHAT IT IS:

A new tool from Facebook AI Research that enables training of multi-relation graph embeddings for very large graphs. PyTorch-BigGraph (PBG) handles graphs with billions of nodes and trillions of edges. Since PBG is written in PyTorch, researchers and engineers can easily swap in their own loss functions, models, and other components.

Read more on our AI blog about PBG and our first published embeddings.

WHAT IT DOES:

Engineers can generate embeddings of nodes in a graph (knowledge graphs, graphs of stock transactions, online content, etc.) without specialized computing resources like multiple GPUs or huge amounts of memory.

PBG takes as input a graph data set in the form of a list of edges. Each edge consists of a source node, a destination node, and an optional relation type. PBG shards the nodes and edges, performs training on multiple threads (on a single machine or multiple machines in parallel), and then outputs a list of embeddings, one per unique node ID in the edgelist. These embeddings can then be used as inputs for a variety of tasks, such as feeding them into FAISS to perform fast nearest-neighbor search at large scale.

WHY IT MATTERS:

Embeddings are an important unsupervised approach in AI. Many practitioners have large graphs with multiple relation types and multiple entity types. Graph embeddings allow them to turn that data into something usable by an ML algorithm.

PBG scales graph embedding algorithms from the literature to extremely large graphs. Compared with commonly used embedding software, PBG is robust, scalable, and highly optimized. It is often orders of magnitude faster, and it produces embeddings of comparable quality to state-of-the-art models on standard benchmarks.

We hope that PBG will be a useful tool for researchers, smaller companies, and organizations that may have large graph data sets but lack the tools to efficiently apply this data to their ML applications. We encourage developers to release and experiment with larger data sets and hope that unsupervised learning on massive graphs may eventually lead to better algorithms for inference on graph structured data.