Data

Instruction Data

Our training data are consist of video instruction and image instruction. You can download our preprocessed instruction data from here

video instruction

We use the video instruction data from PLLaVA

Instruction: You can download from magic_jsons

Videos:

Note: The Preprocessed links come from this issue provided by the authors of VideoChat2. If the links are not available, please refer to their original VideoChat2.

VideoChat Preprocessed download link
VideoChatGPT Direct download link
Kinetics-710 Alternative download link
SthSthV2
NExTQA
CLEVRER Direct download links
YouCook2 Preprocessed download link
TextVR
TGIF
EgoQA Preprocessed download link

image instruction

Instruction: LLaVA-Instruct

Images: LLaVA-Instruct

We mix the video instruction and image instruction. These files are organized in the following structure

playground/data/
    ├── llava_videochat2_filter.json
    ├── video/videochat
        ├── clevrer/video_train
        ├── egoqa
        ├── kinetic
            ├── k400
            ├── k600
            └── k700
        ├── nextqa
        ├── ssv2
        ├── textvr
            ├── Activity
            ├── Cooking
            ├── Driving
            ├── Games
            ├── News_Movie
            ├── Sports
            ├── Street_View_Indoor
            └── Street_View_Outdoor
        ├── tgif
        ├── videochat2
        ├── videochatgpt
        └── youcook
            ├── train
            └── validation
    ├── image
        ├── coco/train2017
        ├── gqa/images
        ├── ocr_vqa/images
        ├── textvqa/train_images
        └── vg
            ├── VG_100K
            └── VG_100K_2

Evaluation Data

EgoScheme-subset
NExTQA-val
EgoPlan-test: We cut the video clips by the start_frame and end_frame. Preprocessed download link
MVBench

These files are organized in the following structure

playground/eval/GPT_Zero_Shot_QA
    ├── EgoSchema_Zero_Shot_QA
        ├── videos
        ├── test_q.json
        └── test_a.json
    ├── NExT_Zero_Shot_QA
        ├── videos
        ├── test_q.json
        └── test_a.json
    ├── EgoPlan_Zero_Shot_QA
        ├── videos
        ├── test_q.json
        └── test_a.json
    └── MVBench_Zero_Shot_QA
        ├── videos
            ├── clevrer
            ├── FunQA_test
            ├── Moments_in_Time_Raw
            ├── nturgbd
            ├── perception
            ├── scene_qa
            ├── ssv2_video
            ├── sta
            ├── star
            ├── tvqa
            └── vlnqa
        ├── test_q.json
        └── test_a.json

To fit our evaluation pipeline, we reformat these texts into two files, test_q.json and test_a.json, in the following format:

test_q.json

[
    {
        "video_name": "",
        "question_id": "",
        "question": "",
        "option":{
            "option 0": "",
            "option 1": "",
            "option 2": "",
            "option 3": "",
            "option 4": "",
        },
        "type": ""
    }
]

test_a.json

[
    {
        "answer": 0,
        "question_id": "",
    }
]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DATA.md

DATA.md

Data

Instruction Data

Evaluation Data

Files

DATA.md

Latest commit

History

DATA.md

File metadata and controls

Data

Instruction Data

Evaluation Data