Exploring random encoders for sentence classification

WHAT THE RESEARCH IS:

A strong, novel baseline for sentence embeddings that requires no training whatsoever. We explored various methods for computing sentence representations from pretrained word embeddings without any additional training. The aim is to put sentence embeddings on a more solid footing by 1) looking at how much modern sentence embeddings gain from training (surprisingly little, as it turns out); and 2) providing the field with more appropriate (and quite strong) baselines going forward.

HOW IT WORKS:

A sentence embedding is a vector representation, where sentences are mapped to sequences of numbers that represent their meaning. This is often created by transforming word embeddings through a composition function. Sentence embeddings are a hot topic in natural language processing (NLP) because they facilitate better text classification than using word embeddings alone. Given the pace of research on sentence representations, it is important to establish solid baselines to build upon.

We set out to determine what was gained, if anything, by using current state-of-the-art methods rather than random methods that combine nothing but pretrained word embeddings. The power of random features has long been known in the machine learning community, so we applied it to this NLP task. We explored three methods: bag of random embedding projections, random LSTMs, and echo state networks. Our findings indicated that much of the lifting power in sentence embeddings comes from word representations. We found that random parameterizations over pretrained word embeddings constituted a very strong baseline and sometimes even matched the performance of well-known sentence encoders such as SkipThought and InferSent. These findings impose a strong baseline for research in representation learning for sentences going forward. We also made important observations about proper experimental protocol for sentence classification evaluation, together with recommendations for future research.

WHY IT MATTERS:

While there has been much research on sentence encoders recently, the relationship between word embeddings and sentence embeddings remains poorly understood by NLP researchers. With the field moving so quickly, comparisons between different methods aren’t always done properly. It is important to take a step back every once in a while to try to get a deeper understanding of existing state-of-the-art methods and to analyze why they work. By providing new insights into sentence embeddings, and by imposing stronger baselines, this work improves our collective understanding of how neural networks represent and understand language. We have shared our findings on github.