Skip to content

THUNLP-MT/StreamingBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding

StreamingBench evaluates Multimodal Large Language Models (MLLMs) in real-time, streaming video understanding tasks. 🌟

🎞️ Overview

As MLLMs continue to advance, they remain largely focused on offline video comprehension, where all frames are pre-loaded before making queries. However, this is far from the human ability to process and respond to video streams in real-time, capturing the dynamic nature of multimedia content. To bridge this gap, StreamingBench introduces the first comprehensive benchmark for streaming video understanding in MLLMs.

Key Evaluation Aspects

  • 🎯 Real-time Visual Understanding: Can the model process and respond to visual changes in real-time?
  • 🔊 Omni-source Understanding: Does the model integrate visual and audio inputs synchronously in real-time video streams?
  • 🎬 Contextual Understanding: Can the model comprehend the broader context within video streams?

Dataset Statistics

  • 📊 900 diverse videos
  • 📝 4,500 human-annotated QA pairs
  • ⏱️ Five questions per video at different timestamps

🎬 Video Categories

Video Categories

🔍 Task Taxonomy

Task Taxonomy

📐 Dataset Examples

example.mp4

🔮 Evaluation Pipeline

Requirements

  • Python 3.x
  • moviepy

Data Preparation

  1. Download Dataset: Retrieve all necessary files from the StreamingBench Dataset.

  2. Decompress Files: Extract the downloaded files and organize them in the ./data directory as follows:

    StreamingBench/
    ├── data/
    │   ├── real/               # Unzip Real Time Visual Understanding_*.zip into this folder
    │   ├── omni/               # Unzip other .zip files into this folder
    │   ├── sqa/                # Unzip Sequential Question Answering_*.zip into this folder
    │   └── proactive/          # Unzip Proactive Output_*.zip into this folder
    
  3. Preprocess Data: Run the following command to preprocess the data:

    cd ./scripts
    bash preprocess.sh

Model Preparation

Prepare your own model for evaluation by following the instructions provided here. This guide will help you set up and configure your model to ensure it is ready for testing against the dataset.

Evaluation

Now you can run the benchmark:

bash eval.sh

This will run the benchmark and save the results to the specified output file. Then you can calculate the metrics using the following command:

bash stats.sh

🔬 Experimental Results

Performance of Various MLLMs on StreamingBench

  • All Context
Task Taxonomy
  • 60 seconds of context preceding the query time
Task Taxonomy
  • Comparison of Main Experiment vs. 60 Seconds of Video Context
  • Task Taxonomy

Performance of Different MLLMs on the Proactive Output Task

"≤ xs" means that the answer is considered correct if the actual output time is within x seconds of the ground truth.

Task Taxonomy

📝 Citation

@article{lin2024streaming,
  title={StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding},
  author={Junming Lin and Zheng Fang and Chi Chen and Zihao Wan and Fuwen Luo and Peng Li and Yang Liu and Maosong Sun},
  journal={arXiv preprint arXiv:2411.03628},
  year={2024}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •