Skip to content

Latest commit

 

History

History
158 lines (124 loc) · 5.42 KB

DATA.md

File metadata and controls

158 lines (124 loc) · 5.42 KB

Data

Instruction Data

Our training data are consist of video instruction and image instruction. You can download our preprocessed instruction data from here

  1. video instruction

We use the video instruction data from PLLaVA

Instruction: You can download from magic_jsons

Videos:

Note: The Preprocessed links come from this issue provided by the authors of VideoChat2. If the links are not available, please refer to their original VideoChat2.

  1. image instruction

Instruction: LLaVA-Instruct

Images: LLaVA-Instruct

We mix the video instruction and image instruction. These files are organized in the following structure

playground/data/
    ├── llava_videochat2_filter.json
    ├── video/videochat
        ├── clevrer/video_train
        ├── egoqa
        ├── kinetic
            ├── k400
            ├── k600
            └── k700
        ├── nextqa
        ├── ssv2
        ├── textvr
            ├── Activity
            ├── Cooking
            ├── Driving
            ├── Games
            ├── News_Movie
            ├── Sports
            ├── Street_View_Indoor
            └── Street_View_Outdoor
        ├── tgif
        ├── videochat2
        ├── videochatgpt
        └── youcook
            ├── train
            └── validation
    ├── image
        ├── coco/train2017
        ├── gqa/images
        ├── ocr_vqa/images
        ├── textvqa/train_images
        └── vg
            ├── VG_100K
            └── VG_100K_2

Evaluation Data

These files are organized in the following structure

playground/eval/GPT_Zero_Shot_QA
    ├── EgoSchema_Zero_Shot_QA
        ├── videos
        ├── test_q.json
        └── test_a.json
    ├── NExT_Zero_Shot_QA
        ├── videos
        ├── test_q.json
        └── test_a.json
    ├── EgoPlan_Zero_Shot_QA
        ├── videos
        ├── test_q.json
        └── test_a.json
    └── MVBench_Zero_Shot_QA
        ├── videos
            ├── clevrer
            ├── FunQA_test
            ├── Moments_in_Time_Raw
            ├── nturgbd
            ├── perception
            ├── scene_qa
            ├── ssv2_video
            ├── sta
            ├── star
            ├── tvqa
            └── vlnqa
        ├── test_q.json
        └── test_a.json

To fit our evaluation pipeline, we reformat these texts into two files, test_q.json and test_a.json, in the following format:

test_q.json

[
    {
        "video_name": "",
        "question_id": "",
        "question": "",
        "option":{
            "option 0": "",
            "option 1": "",
            "option 2": "",
            "option 3": "",
            "option 4": "",
        },
        "type": ""
    }
]

test_a.json

[
    {
        "answer": 0,
        "question_id": "",
    }
]