StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding

🏠 Project Page | 📄 arXiv Paper | 📦 Dataset | 🏅Leaderboard

StreamingBench evaluates Multimodal Large Language Models (MLLMs) in real-time, streaming video understanding tasks. 🌟

🎞️ Overview

As MLLMs continue to advance, they remain largely focused on offline video comprehension, where all frames are pre-loaded before making queries. However, this is far from the human ability to process and respond to video streams in real-time, capturing the dynamic nature of multimedia content. To bridge this gap, StreamingBench introduces the first comprehensive benchmark for streaming video understanding in MLLMs.

Key Evaluation Aspects

🎯 Real-time Visual Understanding: Can the model process and respond to visual changes in real-time?
🔊 Omni-source Understanding: Does the model integrate visual and audio inputs synchronously in real-time video streams?
🎬 Contextual Understanding: Can the model comprehend the broader context within video streams?

Dataset Statistics

📊 900 diverse videos
📝 4,500 human-annotated QA pairs
⏱️ Five questions per video at different timestamps

🎬 Video Categories

🔍 Task Taxonomy

📐 Dataset Examples

example.mp4

🔮 Evaluation Pipeline

Requirements

Python 3.x
moviepy

Data Preparation

Download Dataset: Retrieve all necessary files from the StreamingBench Dataset.

Decompress Files: Extract the downloaded files and organize them in the ./data directory as follows:

StreamingBench/
├── data/
│   ├── real/               # Unzip Real Time Visual Understanding_*.zip into this folder
│   ├── omni/               # Unzip other .zip files into this folder
│   ├── sqa/                # Unzip Sequential Question Answering_*.zip into this folder
│   └── proactive/          # Unzip Proactive Output_*.zip into this folder

Preprocess Data: Run the following command to preprocess the data:
```
cd ./scripts
bash preprocess.sh
```

Model Preparation

Prepare your own model for evaluation by following the instructions provided here. This guide will help you set up and configure your model to ensure it is ready for testing against the dataset.

Evaluation

Now you can run the benchmark:

bash eval.sh

This will run the benchmark and save the results to the specified output file. Then you can calculate the metrics using the following command:

bash stats.sh

🔬 Experimental Results

Performance of Various MLLMs on StreamingBench

All Context

60 seconds of context preceding the query time

Comparison of Main Experiment vs. 60 Seconds of Video Context

Performance of Different MLLMs on the Proactive Output Task

"≤ xs" means that the answer is considered correct if the actual output time is within x seconds of the ground truth.

📝 Citation

@article{lin2024streaming,
  title={StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding},
  author={Junming Lin and Zheng Fang and Chi Chen and Zihao Wan and Fuwen Luo and Peng Li and Yang Liu and Maosong Sun},
  journal={arXiv preprint arXiv:2411.03628},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
docs		docs
figs		figs
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding

🎞️ Overview

Key Evaluation Aspects

Dataset Statistics

🎬 Video Categories

🔍 Task Taxonomy

📐 Dataset Examples

🔮 Evaluation Pipeline

Requirements

Data Preparation

Model Preparation

Evaluation

🔬 Experimental Results

Performance of Various MLLMs on StreamingBench

Performance of Different MLLMs on the Proactive Output Task

📝 Citation

About

Releases

Packages

Contributors 3

Languages

License

THUNLP-MT/StreamingBench

Folders and files

Latest commit

History

Repository files navigation

StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding

🎞️ Overview

Key Evaluation Aspects

Dataset Statistics

🎬 Video Categories

🔍 Task Taxonomy

📐 Dataset Examples

🔮 Evaluation Pipeline

Requirements

Data Preparation

Model Preparation

Evaluation

🔬 Experimental Results

Performance of Various MLLMs on StreamingBench

Performance of Different MLLMs on the Proactive Output Task

📝 Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages