« Back to all Projects

DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations

Implemented By:


We present DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations, a self-supervised method for learning universal sentence embeddings that transfer to a wide variety of natural language processing (NLP) tasks. Our objective leverages recent advances in deep metric learning (DML) and has the advantage of being conceptually simple and easy to implement, requiring no specialized architectures or labelled training data. We demonstrate that our objective can be used to pretrain transformers to state-of-the-art performance on SentEval, a popular benchmark for evaluating universal sentence embeddings, outperforming existing supervised, semi-supervised and unsupervised methods. We perform extensive ablations to determine which factors contribute to the quality of the learned embeddings. Our code is publicly available and can be easily adapted to new datasets or used to embed unseen text.

Results on SentEval are presented below (as averaged scores on the downstream and probing task test sets), along with existing state-of-the-art methods.

Model Requires labelled data? Parameters Embed. dim. Downstream (-SNLI) Probing Δ
InferSent V2 Yes 38M 4096 76.00 72.58 -3.06
Universal Sentence Encoder Yes 147M 512 78.89 66.70 -0.17
Sentence Transformers ("roberta-base-nli-mean-tokens") Yes 125M 768 77.19 63.22 -1.87
Transformer-small (DistilRoBERTa-base) No 82M 768 72.58 74.57 -6.48
Transformer-base (RoBERTa-base) No 125M 768 72.70 74.19 -6.36
DeCLUTR-small (DistilRoBERTa-base) No 82M 768 77.41 74.71 -1.65
DeCLUTR-base (RoBERTa-base) No 125M 768 79.06 74.65 --

Transformer- is the same underlying architecture and pretrained weights as DeCLUTR- before continued pretraining with our contrastive objective. Transformer- and DeCLUTR- use mean pooling on their token-level embeddings to produce a fixed-length sentence representation. Downstream scores are computed without considering perfomance on SNLI (denoted "Downstream (-SNLI)") as InferSent, USE and Sentence Transformers all train on SNLI. Δ: difference to DeCLUTR-base downstream score.