Scale matters. This is the lens through which engineers from Facebook, Google, LinkedIn, Microsoft, Pinterest, Uber, and Yandex shared presentations about their approach to tackling technical challenges in large-scale storage systems and analytics at today’s Data @Scale conference in Seattle.
Kicking off the conference, Facebook engineer Pieter Noordhuis shared insights from a newly released paper, “Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour.” The paper demonstrates how creative infrastructure design can contribute to more efficient deep learning at scale.
A result of a close collaboration between the Facebook AI Research and Applied Machine Learning groups, the paper describes how Facebook researchers were able to reduce the time to train the ImageNet-1k dataset of over 1.2 million images from multiple days to one hour with leading classification accuracy. The team achieved this result with Caffe2 and the Gloo library for collective communication — both available on GitHub — and Big Basin, Facebook’s next-generation GPU server, the design for which was contributed to the Open Compute Project earlier this year.
With these findings, machine learning researchers will be able to experiment, test hypotheses, and drive the evolution of a range of dependent technologies — everything from fun face filters to 360 video to augmented reality.