ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

Implemented By:


This work implements the VQA task with ViLBERT. It does not reproduce the pre-training regime that the original paper goes through, and as such, does not reproduce the scores from the paper.

It achieves 41% F1 and 52% VQA score. If you want to produce SOTA scores with this model, please contact the AllenNLP team at allennlp-contact@allenai.org. There are easy ways for getting SOTA scores with VilBERT that don't involve the pre-training regime from the original paper.

This model is maintained by the AllenNLP team and its contributors on the AllenNLP models repository at https://github.com/allenai/allennlp-models.