Skip to content

JKBox/AntGMM-Chat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

AntGMM-Chat

Approach

AntGMM-Chat is a multimodal method trained on a variety of multimodal datasets, which benefits many different tasks.

AntGMM-Chat has a total of 8B parameters. The visual encoder, dubbed as AntGMM-Encoder, is a transformer-based model with ~1B parameters. The LLM, namely LLaMA2-7B-ZH shares the same structure with LLaMA2-7B. Before multi-modal training, we first pre-train the visual encoder as well as the LLM respectively using in-house visual and multilingual data, where the visual encoder is trained from scratch and the LLM is initialized with LLaMA2-7B pre-trained weights. After that, we freeze LLaMA2-7B-ZH and update the rest parameters of the whole model in multimodal training process.

Model architecture

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published