Date | Announcements |
---|---|
12/2024 | 🎆 🎆 The first platform for multimodal intent analysis has been released. Refer to the directory MMIA for the dataset and codes. |
5/2024 | 🎆 🎆 An unsupervised multimodal clustering method (UMC) has been released. Refer to the paper UMC. |
3/2024 | 🎆 🎆 A token-level contrastive learning method with modality-aware prompting (TCL-MAP) has been released. Refer to the paper TCL-MAP. |
1/2024 | 🎆 🎆 The first large-scale multimodal intent dataset has been released. Refer to the directory MIntRec2.0 for the dataset and codes. Read the paper -- MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations (Published in ICLR 2024). |
10/2022 | 🎆 🎆 The first multimodal intent dataset is published. Refer to the directory MIntRec for the dataset and codes. Read the paper -- MIntRec: A New Dataset for Multimodal Intent Recognition (Published in ACM MM 2022). |
MMIA has the following features:
-
Large in Scale: It contains 4 datasets in total, which are MintRec, MintRec2.0, MELD-DA and IEMOCAP.
-
Multi-turn & Multi-party Dialogues: For example, MintRec2.0 contains 1,245 dialogues with an average of 12 utterances per dialogue in continuous conversations. Each utterance has an intent label in each dialogue. Each dialogue has at least two different speakers with annotated speaker identities for each utterance.
-
Out-of-distribution Detection: As real-world dialogues are in the open-world scenarios as suggested in TEXTOIR, we further include an OOD tag for detecting those utterances that do not belong to any of existing intent classes. They can be used for out-of-distribution detection and improve system robustness.
Here we provide the details of the datasets in MMIA. You can download the datasets from the following links.
Datasets | Source |
---|---|
MintRec | Paper |
MintRec2.0 | Paper |
MELD-DA | Paper |
IEMOCAP-DA | Paper |
Here we provide the details of the models in MMIA.
Model Name | Source | Published |
---|---|---|
MULT | Paper / Code | ACL 2019 |
MAG_BERT | Paper / Code | ACL 2020 |
MCN | Paper / Code | CVPR 2020 |
CC | Paper / Code | AAAI 2021 |
MMIM | Paper / Code | EMNLP 2021 |
sccl | Paper / Code | NAACL 2021 |
USNID | Paper / Code | IEEE TKDE 2023 |
SDIF | Paper / Code | ICASSP 2024 |
TCL_MAP | Paper / Code | AAAI 2024 |
UMC | Paper / Code | ACL 2024 |
Please refer to the results for the detailed results of the models in MMIA.
-
Use anaconda to create Python environment
conda create --name MMIA python=3.9 conda activate MMIA
-
Install PyTorch (Cuda version 11.2)
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
-
Clone the MMIA repository.
git clone [email protected]:thuiar/MMIA.git cd MMIA
-
Install related environmental dependencies
pip install -r requirements.txt
-
Run examples (Take mag-bert as an example, more can be seen here)
sh examples\multi_turn\run_mag_bert_multiturn.sh
Notice: You should correctly set the file path address in the .sh file.
-
Prepare Data
Create a new directory to store your dataset. You should provide the train.tsv, dev.tsv, and test.tsv. You should specify the dataset path in the .sh file。 -
Dataloader Setting
You need to add the new dataset name to the benchmarks list in data. You need to define the intent_labels, max_seq_lengths, ood_data, and other information for the new dataset. For example:
'MIntRec':{
'intent_labels': [
],
'max_seq_lengths': {
'text': 30,
'video': 230,
'audio': 480,
},
'ood_data':{
'MIntRec-OOD': {'ood_label': 'UNK'}
}
- Features data To prepare features for video and audio, you need to define the feature files in features_config and prepare the corresponding files in data_path/video_data/ and audio_data/.
video_feats_path = {
'swin-roi': 'swin_roi.pkl',#2
# 'swin-roi': 'swin_roi_binary.pkl',#2
'resnet-50':'video_feats.pkl',#1
'swin-full': 'swin_feats.pkl'#tcl ##IEMOCAP #MELD-DA
}
- Provide a new backbone in backbones and create a new model and file. For example:
from .FeatureNets import BERTEncoder, RoBERTaEncoder
# from sentence_transformers import SentenceTransformer
text_backbones_map = {
'bert-base-uncased': BERTEncoder,
- Configure the new backbone in configs. For example:
pretrained_models_path = {
'bert-base-uncased': '/home/sharing/disk1/pretrained_embedding/bert/uncased_L-12_H-768_A-12/',
'bert-large-uncased':'/home/sharing/disk1/pretrained_embedding/bert/bert-large-uncased',
}
- Provide a new backbone in backbones and create a new model and file. For example:
from .mag_bert import MAG_BERT
-
Configure the parameters for the new method in configs, for example mag_bert_config.
-
Add the new method in method and create a new model and file, for example mag_bert. You need to define the optimizer, loss function, and methods for training and testing.
from .MAG_BERT.manager import MAG_BERT
from .TEXT.manager import TEXT
from .MULT.manager import MULT
method_map = {
'mag_bert': MAG_BERT,
'text': TEXT,
'mult': MULT,
If this work is helpful, or you want to use the codes and results in this repo, please cite the following papers:
- MIntRec2.0: A Large-scale Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations
- MIntRec: A New Dataset for Multimodal Intent Recognition
- Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances
- Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent Recognition
@inproceedings{MIntRec2.0,
title={{MI}ntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations},
author={Zhang, Hanlei and Wang, Xin and Xu, Hua and Zhou, Qianrui and Su, Jianhua and Zhao, Jinyue and Li, Wenrui and Chen, Yanting and Gao, Kai},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=nY9nITZQjc}
}
@inproceedings{MIntRec,
author = {Zhang, Hanlei and Xu, Hua and Wang, Xin and Zhou, Qianrui and Zhao, Shaojie and Teng, Jiayan},
title = {MIntRec: A New Dataset for Multimodal Intent Recognition},
year = {2022},
booktitle = {Proceedings of the 30th ACM International Conference on Multimedia},
pages = {1688–1697},
}
@inproceedings{UMC,
title = "Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances",
author = "Zhang, Hanlei and Xu, Hua and Long, Fei and Wang, Xin and Gao, Kai",
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2024",
url = "https://aclanthology.org/2024.acl-long.2",
doi = "10.18653/v1/2024.acl-long.2",
pages = "18--35",
}
@inproceedings{TCL-MAP,
title={Token-level contrastive learning with modality-aware prompting for multimodal intent recognition},
author={Zhou, Qianrui and Xu, Hua and Li, Hao and Zhang, Hanlei and Zhang, Xiaohan and Wang, Yifan and Gao, Kai},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={38},
number={15},
pages={17114--17122},
year={2024}
}