DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations

Source Code

Implemented By:

JG John Giorgi

johnmgiorgi@gmail.com

@JohnMGiorgi
ON Osvald Nitski

@OsvaldNitski

Description:

We present DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations, a self-supervised method for learning universal sentence embeddings that transfer to a wide variety of natural language processing (NLP) tasks. Our objective leverages recent advances in deep metric learning (DML) and has the advantage of being conceptually simple and easy to implement, requiring no specialized architectures or labelled training data. We demonstrate that our objective can be used to pretrain transformers to state-of-the-art performance on SentEval, a popular benchmark for evaluating universal sentence embeddings, outperforming existing supervised, semi-supervised and unsupervised methods. We perform extensive ablations to determine which factors contribute to the quality of the learned embeddings. Our code is publicly available and can be easily adapted to new datasets or used to embed unseen text.

Results on SentEval are presented below (as averaged scores on the downstream and probing task test sets), along with existing state-of-the-art methods.

Model	Requires labelled data?	Parameters	Embed. dim.	Downstream (-SNLI)	Probing	Δ
InferSent V2	Yes	38M	4096	76.00	72.58	-3.06
Universal Sentence Encoder	Yes	147M	512	78.89	66.70	-0.17
Sentence Transformers ("roberta-base-nli-mean-tokens")	Yes	125M	768	77.19	63.22	-1.87
Transformer-small (DistilRoBERTa-base)	No	82M	768	72.58	74.57	-6.48
Transformer-base (RoBERTa-base)	No	125M	768	72.70	74.19	-6.36
DeCLUTR-small (DistilRoBERTa-base)	No	82M	768	77.41	74.71	-1.65
DeCLUTR-base (RoBERTa-base)	No	125M	768	79.06	74.65	--

Transformer- is the same underlying architecture and pretrained weights as DeCLUTR- before continued pretraining with our contrastive objective. Transformer- and DeCLUTR- use mean pooling on their token-level embeddings to produce a fixed-length sentence representation. Downstream scores are computed without considering perfomance on SNLI (denoted "Downstream (-SNLI)") as InferSent, USE and Sentence Transformers all train on SNLI. Δ: difference to DeCLUTR-base downstream score.

Related Papers:
- DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations
Tags:
- sentence embeddings
- metric learning
AllenNLP Version: 1.1.0
Languages: en
Datasets:
OpenWebText
Submitted On Mar 26, 2021