GitHub - JiuTian-VL/MoME: [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models

MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models

Leyang Shen*, Gongwei Chen*, Rui Shao†, Weili Guan, Liqiang Nie†

School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen
*Equal contribution †Corresponding author

The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024)

[Paper] [Project Page]

If you find this work useful for your research, please kindly cite our paper and star our repo.

Updates

[12/2024] Code and checkpoints are released.
[09/2024] Project page released!
[09/2024] MoME has been accepted by NeurIPS 2024!
[07/2024] Arxiv paper released.

Introduction

This is the github repository of MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models. In this work, we propose a mixture of multimodal experts (MoME) to mitigate task interference and obtain a generalist MLLM.

Our MoME is composed of two key components, a mixture of vision experts (MoVE) and a mixture of language experts (MoLE). MoVE can adaptively modulate the features transformed from various vision encoders, and has a strong compatibility in transformation architecture. MoLE incorporates sparsely gated experts into LLMs to achieve painless improvements with roughly unchanged inference costs.

The architecture of the proposed MoME model:

Installation

Download

git clone https://github.com/JiuTian-VL/MoME.git
cd MoME

Environment

conda create -n mome python=3.12
conda activate mome
pip install -r requirements.txt

Checkpoints

Please download all the required checkpoints by running the download_ckpt.py script.

python download_ckpt.py

The required checkpoints will be downloaded to the ./checkpoints directory from huggingface.

Inference and Demo

We provide an inference example in playground.ipynb, which includes a minimal example of how to use the MoME model for inference.

A gradio demo used for model testing and router visualization is also provided in demo_mome.py. You can start the demo by running the following command:

python demo_mome.py

Multitasking Benchmark

We collected 24 datasets and categorized them into four groups for instruction-tuning and evaluation:

Evaluation results

Here we list the multitasking performance comparison of MoME and baselines. Please refer to our paper for more details.

Qualitative Examples

Citation

If you find this work useful for your research, please kindly cite our paper:

@inproceedings{shen2024mome,
    title={MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models}, 
    author={Shen, Leyang and Chen, Gongwei and Shao, Rui and Guan, Weili and Nie, Liqiang},
    booktitle={Advances in neural information processing systems},
    year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
configs		configs
demo_figs		demo_figs
models		models
preprocessors		preprocessors
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo_mome.py		demo_mome.py
download_ckpt.py		download_ckpt.py
playground.ipynb		playground.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models

If you find this work useful for your research, please kindly cite our paper and star our repo.

Updates

Introduction

Installation

Download

Environment

Checkpoints

Inference and Demo

Multitasking Benchmark

Evaluation results

Qualitative Examples

Citation

About

Releases

Packages

Languages

License

JiuTian-VL/MoME

Folders and files

Latest commit

History

Repository files navigation

MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models

If you find this work useful for your research, please kindly cite our paper and star our repo.

Updates

Introduction

Installation

Download

Environment

Checkpoints

Inference and Demo

Multitasking Benchmark

Evaluation results

Qualitative Examples

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages