ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

Implemented By:

Akshita Bhagia

akshitab@allenai.org
Allen Institute for Artificial Intelligence
Dirk Groeneveld

dirkg@allenai.org
Allen Institute for Artificial Intelligence

@mechanicaldirk
Pete Walsh

petew@allenai.org
Allen Institute for Artificial Intelligence

@epwalsh10

Description:

This work implements the VQA task with ViLBERT. It does not reproduce the pre-training regime that the original paper goes through, and as such, does not reproduce the scores from the paper.

It achieves 41% F1 and 52% VQA score. If you want to produce SOTA scores with this model, please contact the AllenNLP team at allennlp-contact@allenai.org. There are easy ways for getting SOTA scores with VilBERT that don't involve the pre-training regime from the original paper.

This model is maintained by the AllenNLP team and its contributors on the AllenNLP models repository at https://github.com/allenai/allennlp-models.

Related Papers:
- ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Tags:
- vqa
- vilbert
- qa
- vision
- question answering
AllenNLP Version: >=2.0
Languages: en
Datasets:
VQAv2
Submitted On Mar 25, 2021