Skip to content
/ OVQA Public

Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 2023)

Notifications You must be signed in to change notification settings

mlvlab/OVQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models

This is the official implementation of OVQA (ICCV 2023). (arxiv)

Dohwan Ko, Ji Soo Lee, Miso Choi, Jaewon Chu, Jihwan Park, Hyunwoo J. Kim.

Department of Computer Science and Engineering, Korea University

 (a) Closed-vocabulary Video Question Answering       (b) Open-vocabulary Video Question Answering (Ours)

Setup

To install requirements, run:

conda create -n ovqa python=3.8
conda activate ovqa
sh setup.sh

Data Preparation

Download preprocessed data, visual features

Pretrained checkpoint, preprocessed data, and data annotations are provided here. You can download pretrained DeBERTa-v2-xlarge model here.

Then, place the files as follows:

./pretrained
   |─ pretrained.pth
   └─ deberta-v2-xlarge
./meta_data
   |─ activitynet
   │	|─ train.csv
   │	|─ test.csv
   │	|─ train_vocab.json
   │	|─ test_vocab.json
   │	|─ clipvitl14.pth
   │	|─ subtitles.pkl
   │	|─ ans2cat.json
   │	└─ answer_graph
   │       |─ train_edge_index.pth
   │       |─ train_x.pth
   │       |─ test_edge_index.pth
   │       └─ test_x.pth
   │
   |─ msvd
   │	|─ train.csv
   │	|─ test.csv
   │	|─ train_vocab.json
   │	|─ test_vocab.json
   │	|─ clipvitl14.pth
   │	|─ subtitles.pkl
   │	|─ ans2cat.json
   │	└─ answer_graph
   │       |─ train_edge_index.pth
   │       |─ train_x.pth
   │       |─ test_edge_index.pth
   │       └─ test_x.pth
   │  :

Train FrozenBiLM+ on OVQA

To train on ActivityNet-QA, MSVD-QA, TGIF-QA, and MSRVTT-QA, run below command. You can modify --dataset activitynet to change dataset.

python -m torch.distributed.launch --nproc_per_node 4 --use_env train.py --dist-url tcp://127.0.0.1:12345 \
--dataset activitynet --lr 5e-5 --batch_size 8 --batch_size_test 32 --save_dir ./path/to/save/files --epochs 20 --eps 0.7

Acknowledgements

This repo is built upon FrozenBiLM.

Citation

@inproceedings{ko2023open,
  title={Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models},
  author={Ko, Dohwan and Lee, Ji Soo and Choi, Miso and Chu, Jaewon and Park, Jihwan and Kim, Hyunwoo J},
  booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
  year={2023}
}

About

Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 2023)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published