Skip to content
/ MMIA Public

The first opensource platform for multimodal intent analysis

Notifications You must be signed in to change notification settings

thuiar/MMIA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MultiModal Intent Analysis (MMIA)

Download the dataset PRs are Welcome

MMIA is the first platform for multimodal intent analysis, including state-of-the-art algorithms in multimodal classification, out-of-distribution detection, and clustering for intent analysis in conversational interactions. This repo supports adding datasets, algorithms, and can configure parameters conveniently.

Updates 🔥 🔥 🔥

Date Announcements
12/2024 🎆 🎆 The first platform for multimodal intent analysis has been released. Refer to the directory MMIA for the dataset and codes.
5/2024 🎆 🎆 An unsupervised multimodal clustering method (UMC) has been released. Refer to the paper UMC.
3/2024 🎆 🎆 A token-level contrastive learning method with modality-aware prompting (TCL-MAP) has been released. Refer to the paper TCL-MAP.
1/2024 🎆 🎆 The first large-scale multimodal intent dataset has been released. Refer to the directory MIntRec2.0 for the dataset and codes. Read the paper -- MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations (Published in ICLR 2024).
10/2022 🎆 🎆 The first multimodal intent dataset is published. Refer to the directory MIntRec for the dataset and codes. Read the paper -- MIntRec: A New Dataset for Multimodal Intent Recognition (Published in ACM MM 2022).

Features

MMIA has the following features:

  • Large in Scale: It contains 4 datasets in total, which are MintRec, MintRec2.0, MELD-DA and IEMOCAP.

  • Multi-turn & Multi-party Dialogues: For example, MintRec2.0 contains 1,245 dialogues with an average of 12 utterances per dialogue in continuous conversations. Each utterance has an intent label in each dialogue. Each dialogue has at least two different speakers with annotated speaker identities for each utterance.

  • Out-of-distribution Detection: As real-world dialogues are in the open-world scenarios as suggested in TEXTOIR, we further include an OOD tag for detecting those utterances that do not belong to any of existing intent classes. They can be used for out-of-distribution detection and improve system robustness.

Datasets

Here we provide the details of the datasets in MMIA. You can download the datasets from the following links.

Datasets Source
MintRec Paper
MintRec2.0 Paper
MELD-DA Paper
IEMOCAP-DA Paper

Integrated Models

Here we provide the details of the models in MMIA.

Model Name Source Published
MULT Paper / Code ACL 2019
MAG_BERT Paper / Code ACL 2020
MCN Paper / Code CVPR 2020
CC Paper / Code AAAI 2021
MMIM Paper / Code EMNLP 2021
sccl Paper / Code NAACL 2021
USNID Paper / Code IEEE TKDE 2023
SDIF Paper / Code ICASSP 2024
TCL_MAP Paper / Code AAAI 2024
UMC Paper / Code ACL 2024

Results

Please refer to the results for the detailed results of the models in MMIA.

Quick start

  1. Use anaconda to create Python environment

    conda create --name MMIA python=3.9
    conda activate MMIA
    
  2. Install PyTorch (Cuda version 11.2)

    conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
    
  3. Clone the MMIA repository.

    git clone [email protected]:thuiar/MMIA.git
    cd MMIA
    
  4. Install related environmental dependencies

    pip install -r requirements.txt
    
  5. Run examples (Take mag-bert as an example, more can be seen here)

    sh examples\multi_turn\run_mag_bert_multiturn.sh
    

Notice: You should correctly set the file path address in the .sh file.

Extensibility

a. How to add a new dataset?

  1. Prepare Data
    Create a new directory to store your dataset. You should provide the train.tsv, dev.tsv, and test.tsv. You should specify the dataset path in the .sh file

  2. Dataloader Setting
    You need to add the new dataset name to the benchmarks list in data. You need to define the intent_labels, max_seq_lengths, ood_data, and other information for the new dataset. For example:

'MIntRec':{
  'intent_labels': [
  ],
  'max_seq_lengths': {
      'text': 30, 
      'video': 230, 
      'audio': 480, 
  },
  'ood_data':{
      'MIntRec-OOD': {'ood_label': 'UNK'}
  }
  1. Features data To prepare features for video and audio, you need to define the feature files in features_config and prepare the corresponding files in data_path/video_data/ and audio_data/.
video_feats_path = {
    'swin-roi': 'swin_roi.pkl',#2
    # 'swin-roi': 'swin_roi_binary.pkl',#2
    'resnet-50':'video_feats.pkl',#1
    'swin-full': 'swin_feats.pkl'#tcl  ##IEMOCAP   #MELD-DA
}

b. How to add a new backbone?

  1. Provide a new backbone in backbones and create a new model and file. For example:
from .FeatureNets import BERTEncoder, RoBERTaEncoder
# from sentence_transformers import SentenceTransformer

text_backbones_map = {
                    'bert-base-uncased': BERTEncoder,
  1. Configure the new backbone in configs. For example:
pretrained_models_path = {
    'bert-base-uncased': '/home/sharing/disk1/pretrained_embedding/bert/uncased_L-12_H-768_A-12/',
    'bert-large-uncased':'/home/sharing/disk1/pretrained_embedding/bert/bert-large-uncased',
}

c. How to add a new method?

  1. Provide a new backbone in backbones and create a new model and file. For example:
from .mag_bert import MAG_BERT
  1. Configure the parameters for the new method in configs, for example mag_bert_config.

  2. Add the new method in method and create a new model and file, for example mag_bert. You need to define the optimizer, loss function, and methods for training and testing.

from .MAG_BERT.manager import MAG_BERT
from .TEXT.manager import TEXT
from .MULT.manager import MULT


method_map = {
    'mag_bert': MAG_BERT,
    'text': TEXT,
    'mult': MULT,

  1. Add new examples in examples, for example mag_bert.

Citations

If this work is helpful, or you want to use the codes and results in this repo, please cite the following papers:

@inproceedings{MIntRec2.0,
   title={{MI}ntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations},
   author={Zhang, Hanlei and Wang, Xin and Xu, Hua and Zhou, Qianrui and Su, Jianhua and Zhao, Jinyue and Li, Wenrui and Chen, Yanting and Gao, Kai},
   booktitle={The Twelfth International Conference on Learning Representations},
   year={2024},
   url={https://openreview.net/forum?id=nY9nITZQjc}
}
@inproceedings{MIntRec,
   author = {Zhang, Hanlei and Xu, Hua and Wang, Xin and Zhou, Qianrui and Zhao, Shaojie and Teng, Jiayan},
   title = {MIntRec: A New Dataset for Multimodal Intent Recognition},
   year = {2022},
   booktitle = {Proceedings of the 30th ACM International Conference on Multimedia},
   pages = {1688–1697},
}
@inproceedings{UMC,
    title = "Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances",
    author = "Zhang, Hanlei and Xu, Hua and Long, Fei and Wang, Xin and Gao, Kai",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    year = "2024",
    url = "https://aclanthology.org/2024.acl-long.2",
    doi = "10.18653/v1/2024.acl-long.2",
    pages = "18--35",
}
@inproceedings{TCL-MAP,
  title={Token-level contrastive learning with modality-aware prompting for multimodal intent recognition},
  author={Zhou, Qianrui and Xu, Hua and Li, Hao and Zhang, Hanlei and Zhang, Xiaohan and Wang, Yifan and Gao, Kai},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={15},
  pages={17114--17122},
  year={2024}
}

About

The first opensource platform for multimodal intent analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages