To set up the environments, follow the instructions in the existing repositories and download the necessary checkpoints. Additionally, we offer guidance for this step, which addresses potential issues such as package version conflicts and system-related problems.

(optional) checkpoints allow for manual downloading; otherwise, the model will download automatically if the Internet works fine.
use export DECORD_EOF_RETRY_MAX=20480 to prevent possible issues from decord

Installation: Instruction

Checkpoints:

Source: Video-ChatGPT-7B, LLaVA-Lightening-7B-v1-1, clip-vit (optional)

Structure:

    ├── checkpoints/Video-ChatGPT-7B
        ├── LLaVA-7B-Lightening-v1-1
        ├── Video-ChatGPT-7B
        └── clip-vit-large-patch14 (optional)

Valley2
- Installation: Instruction
- Checkpoints:
  - Source: Valley2-7b
  - Structure:
```
    ├── checkpoints/Valley2-7b
```

Video-LLaMA-2

Installation: Instruction
- Possible Issues: torchaudio error: OSError: libtorch_cuda_cpp.so: cannot open shared object file: No such file or directory --> Solution: reinstall torchaudio

Checkpoints:

Source: Video-LLaMA-2-7B-Finetuned/Video-LLaMA-2-13B-Finetuned, VIT (optional), qformer (optional)

Structure:

    ├── checkpoints/Video-LLaMA-2-7B-Finetuned
        ├── AL_LLaMA_2_7B_Finetuned.pth
        ├── imagebind_huge.pth
        ├── llama-2-7b-chat-hf
        ├── VL_LLaMA_2_7B_Finetuned.pth
        ├── blip2_pretrained_flant5xxl.pth (optional)
        └── eva_vit_g.pth (optional)
    ├── checkpoints/Video-LLaMA-2-13B-Finetuned
        ├── AL_LLaMA_2_13B_Finetuned.pth
        ├── imagebind_huge.pth
        ├── llama-2-13b-chat-hf
        ├── VL_LLaMA_2_13B_Finetuned.pth
        ├── blip2_pretrained_flant5xxl.pth (optional)
        └── eva_vit_g.pth (optional)

VideoChat2
- Installation: Instruction
  - Possible Issue 1: ERROR: Could not find a version that satisfies the requirement torch==1.13.1+cu117 --> Solution: pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
  - Possible Issue 2: flash-attention error --> Solution: inference doesn't need flash-attention
- Checkpoints:
  - Source: llama-7b, UMT-L-Qformer, VideoChat2_7B_stage2, VideoChat2_7B_stage3, Vicuna-7B-delta + script
  - Structure:
```
    ├── checkpoints/VideoChat2
        ├── umt_l16_qformer.pth
        ├── videochat2_7b_stage2.pth
        ├── videochat2_7b_stage3.pth
        └── vicuna-7b-v0
```

Video-LLaVA,

Installation: Instruction

Checkpoints:

Source: Video-LLaVA-7B, LanguageBind_Video (optional), LanguageBind_Image (optional)

Structure:

    ├── checkpoints/VideoLLaVA
        ├── Video-LLaVA-7B
        ├── LanguageBind_Video_merge (optional)
        └── LanguageBind_Image (optional)

VideoLaVIT
- Installation: Instruction
  - Possible Issue: package motion-vector-extractor not supported on CentOS --> No Alternative on CentOS
  - Possible Issue: missing accelerate, apex
- Checkpoints:
  - Source: Video-LaVIT-v1
  - Structure:
```
    ├── checkpoints/Video-LaVIT-v1/language_model_sft 
```

LLaMA-VID

Installation: Instruction

Checkpoints:

Source: llama-vid-7b-full-224-video-fps-1, llama-vid-13b-full-224-video-fps-1 eva_vit_g, bert (optional)

Structure:

    ├── checkpoints/LLaMA-VID-7B
        ├── llama-vid-7b-full-224-video-fps-1
        ├── LAVIS/eva_vit_g.pth
        └── bert-base-uncased (optional)
    ├── checkpoints/LLaMA-VID-13B
        ├── llama-vid-13b-full-224-video-fps-1
        ├── LAVIS/eva_vit_g.pth
        └── bert-base-uncased (optional)

MiniGPT4-video

Installation: Instruction

Checkpoints:

Source: video_mistral_checkpoint_last, Mistral-7B-Instruct-v0.2, vit (optional)

Structure:

    ├── checkpoints/MiniGPT4-Video
        ├── checkpoints/video_mistral_checkpoint_last.pth
        ├── Mistral-7B-Instruct-v0.2
        └── eva_vit_g.pth (optional)

PLLaVA

Installation: Instruction

Checkpoints:

Source: pllava-7b, pllava-13b, pllava-34b

Structure:

    ├── checkpoints/PLLaVA
        └── pllava-7b
    ├── checkpoints/PLLaVA
        └── pllava-13b
    ├── checkpoints/PLLaVA
        └── pllava-34b

LLaVA-NeXT

Installation: Instruction
Checkpoints:
- Source: LLaVA-NeXT-Video-7B-DPO, LLaVA-NeXT-Video-34B-DPO

Structure:

    ├── checkpoints/PLLaVA
        └── LLaVA-NeXT-Video-7B-DPO
    ├── checkpoints/PLLaVA
        └── LLaVA-NeXT-Video-34B-DPO

ShareGPT4Video
- Installation: Instruction
- Checkpoints:
  - Source: sharegpt4video-8b
- Structure:
```
    ├── checkpoints/ShareGPT4Video
        └── sharegpt4video-8b
```
Gemini API
GPT4V
- Installation: make a .env under baselines/gpt4v; set API_BASE and API_KEY

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INSTALLATION.md

INSTALLATION.md

Files

INSTALLATION.md

Latest commit

History

INSTALLATION.md

File metadata and controls