ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
This work implements the VQA task with ViLBERT. It does not reproduce the pre-training regime that the original paper goes through, and as such, does not reproduce the scores from the paper.
It achieves 41% F1 and 52% VQA score. If you want to produce SOTA scores with this model, please contact the AllenNLP team at firstname.lastname@example.org. There are easy ways for getting SOTA scores with VilBERT that don't involve the pre-training regime from the original paper.
This model is maintained by the AllenNLP team and its contributors on the AllenNLP models repository at https://github.com/allenai/allennlp-models.