Llama 3 is Meta’s most capable openly-available LLM to date and the recently-released Llama 3.1 will enable new workflows, such as synthetic data generation and model distillation with unmatched flexibility, control, and state-of-the-art capabilities that rival the best closed source models. 

At AI Infra @ Scale 2024, Meta engineers discussed every step of how we built and brought Llama 3 to life, from data and training to inference. 

Joe Spisak, Product Director and Head of Generative AI Open Source at Meta, talks about the history of Llama and Meta’s overarching vision for open source AI.

He’s joined by Delia David, a software engineer at Meta, to discuss all things data-related for GenAI. David covers the diversity, volume, and freshness of data needed for GenAI and how different data types should be extracted and prepared.

Kaushik Veeraraghavan, a software engineer at Meta, discusses how Meta trains Llama at scale and delves into the data center, networking, and software investments that have enabled the development of Meta’s Llama 3 models.

Finally, Ye (Charlotte) Qi, a production engineer at Meta, discusses how Meta handles inference for Llama. Optimizing and scaling LLM inference is important for enabling large-scale product applications. Qi introduces key parallelism techniques that help scale model sizes and context windows, which in turn influence inference system designs. She also discusses practical challenges associated with deploying these complex serving paradigms throughout Meta’s internal cloud to our data center of heterogeneous hardware.

To help personalize content, tailor and measure ads and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookie Policy