Zero-shot learning: Using text to accurately ID images

WHAT THE RESEARCH IS:

Zero-shot learning (ZSL) is a process by which a machine learns to recognize objects it has never seen before. Researchers at Facebook have developed a new, more accurate ZSL model that uses neural net architectures called generative adversarial networks (GANs) to read and analyze text articles, and then visually identify the objects they describe. This novel approach to ZSL allows machines to classify objects based on category, and then use that information to identify other similar objects, as opposed to learning each object individually, as other models do.

HOW IT WORKS:

Researchers trained this model, called generative adversarial zero-shot learning (GAZSL), to identify more than 600 classes of birds across two databases containing more than 60,000 images. It was then given web articles and asked to use the information there to identify birds it had not seen before. The model extracted seven key visual features from the text, created synthetic visualizations of these features, and used those features to identify the correct class of bird.

Researchers then tested the GAZSL model against seven other ZSL algorithms and found it was consistently more accurate across four different benchmarks. Overall, the GAZSL model outperformed other models by between 4 percent and 7 percent, and in some cases by much more.

WHY IT MATTERS:

To become more useful, computer vision systems will need to recognize objects they have not specifically been trained on. For example, it is estimated that there are more than 10,000 living bird species, yet most computer vision data sets of birds have only a couple hundred categories. This new ZSL model, which has been open-sourced, has been shown to produce better results and offers a promising path for future research into machine learning. Much of the research into AI remains foundational, but work that improves how systems are able to understand text and correctly identify objects continues to lay the groundwork for better, more reliable AI systems.