Our training data are consist of video instruction and image instruction. You can download our preprocessed instruction data from here
- video instruction
We use the video instruction data from PLLaVA
Instruction: You can download from magic_jsons
Videos:
Note: The Preprocessed links come from this issue provided by the authors of VideoChat2. If the links are not available, please refer to their original VideoChat2.
- VideoChat Preprocessed download link
- VideoChatGPT Direct download link
- Kinetics-710 Alternative download link
- SthSthV2
- NExTQA
- CLEVRER Direct download links
- YouCook2 Preprocessed download link
- TextVR
- TGIF
- EgoQA Preprocessed download link
- image instruction
Instruction: LLaVA-Instruct
Images: LLaVA-Instruct
We mix the video instruction and image instruction. These files are organized in the following structure
playground/data/
├── llava_videochat2_filter.json
├── video/videochat
├── clevrer/video_train
├── egoqa
├── kinetic
├── k400
├── k600
└── k700
├── nextqa
├── ssv2
├── textvr
├── Activity
├── Cooking
├── Driving
├── Games
├── News_Movie
├── Sports
├── Street_View_Indoor
└── Street_View_Outdoor
├── tgif
├── videochat2
├── videochatgpt
└── youcook
├── train
└── validation
├── image
├── coco/train2017
├── gqa/images
├── ocr_vqa/images
├── textvqa/train_images
└── vg
├── VG_100K
└── VG_100K_2
-
EgoPlan-test: We cut the video clips by the
start_frame
andend_frame
. Preprocessed download link
These files are organized in the following structure
playground/eval/GPT_Zero_Shot_QA
├── EgoSchema_Zero_Shot_QA
├── videos
├── test_q.json
└── test_a.json
├── NExT_Zero_Shot_QA
├── videos
├── test_q.json
└── test_a.json
├── EgoPlan_Zero_Shot_QA
├── videos
├── test_q.json
└── test_a.json
└── MVBench_Zero_Shot_QA
├── videos
├── clevrer
├── FunQA_test
├── Moments_in_Time_Raw
├── nturgbd
├── perception
├── scene_qa
├── ssv2_video
├── sta
├── star
├── tvqa
└── vlnqa
├── test_q.json
└── test_a.json
To fit our evaluation pipeline, we reformat these texts into two files, test_q.json
and test_a.json
, in the following format:
test_q.json
[
{
"video_name": "",
"question_id": "",
"question": "",
"option":{
"option 0": "",
"option 1": "",
"option 2": "",
"option 3": "",
"option 4": "",
},
"type": ""
}
]
test_a.json
[
{
"answer": 0,
"question_id": "",
}
]