korean-audiotext-transformer

Multimodal Transformer for Korean Sentiment Analysis with Audio and Text Features

Overview

STEP1: Convert input audio to text using Google ASR API
STEP2: Extract MFCC feature from input audio
STEP3: Conduct MLM on KoBERT through colloquial review texts crawled from various services
STEP4: Extract sentence embedding by using the text obtained in STEP1 as an input variable of the BERT learned in STEP3
STEP5: Obtain fused representation by using the MFCC feature (from STEP2) and the sentence embedding (from STEP4) as input variables of the Crossmodal Transformer (Tsai et al., 2019)

Datasets

Korean Multimodal Sentiment Analysis Dataset

자율지능 디지털 동반자 감정 분류용 데이터셋 (registration and authorization are needed for downloading)
Classes: 분노(anger), 공포(fear), 중립(neutrality), 슬픔(sadness), 행복(happiness), 놀람(surprise), 혐오(disgust)
Provided Modality: Video, Audio, Text
- We only use audio and text data
- When testing, text is obtained via ASR from audio without using the data provided
- Vision modality is not considered in this project
Train / Dev / Test Split
- Based on audio: 8278 / 1014 / 1003
- Based on text: 280 / 35 / 35

Preprocess

Locate downloaded dataset as follows:

korean-audiotext-transformer/    
└── data/
    └── 4.1 감정분류용 데이터셋/
        ├── 000/
        ├── 001/
        ├── 002/
        ├── ...
        ├── 099/
        ├── participant_info.xlsx
        ├── rename_file.sh
        ├── Script.hwp
        └── test.py

Convert Script.hwp to script.txt

cd data/4.1 감정분류용 데이터셋
hwp5txt Script.hwp --output script.txt

Generate {train, dev, test}.pkl

python preprocess.py \
  raw_path='./data/4.1 감정분류용 데이터셋' \
  script_path'./data/4.1 감정분류용 데이터셋/script.txt' \
  save_path='./data' \
  train_size=.8

Preprocessed Output (train.pkl)

person_idx	audio	sentence	emotion
0	[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...	오늘 입고 나가야지.	행복
2	[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...	오늘 입고 나가야지.	행복
7	[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...	오늘 입고 나가야지.	행복
12	[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...	오늘 입고 나가야지.	행복
17	[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...	오늘 입고 나가야지.	행복

Usage

Prerequisites

We recommend to use conda environment to setup

conda create -n <your_env_name> python=3.6
conda activate <your_env_name>
conda install pip
pip install -r requirements.txt

Before training,

Download fine-tuned BERT, and locate the model as follows:

The model was already fine-tuned with a Korean sentimental analysis dataset
We use this model as a text embedding module (do not fine-tune anymore)

korean-audiotext-transformer/    
└── KoBERT/
    ├── args.bin
    ├── config.json
    ├── model.bin
    ├── tokenization.py
    └── vocab.list

Train AudioText Transformer

specified hyper-parameters are the best ones on development dataset

python train.py \
  --data_path='./data' \
  --bert_path='./KoBERT' \
  --save_path='./result' \
  --attn_dropout=.2 \
  --relu_dropout=.1 \
  --emb_dropout=.2 \
  --res_dropout=.1 \
  --out_dropout=.1 \
  --n_layers=2 \
  --d_model=64 \
  --n_heads=8 \
  --lr=1e-5 \
  --epochs=10 \
  --batch_size=64 \
  --clip=1.0 \
  --warmup_percent=.1 \
  --max_len_audio=400 \
  --sample_rate=48000 \
  --resample_rate=16000 \
  --n_fft_size=400 \
  --n_mfcc=40

Train Audio-Only Baseline

python train.py --only_audio \
  --n_layers=4 \
  --n_heads=8 \
  --lr=1e-3 \
  --epochs=10 \
  --batch_size=64 \

Train Text-Only Baseline

python train.py --only_text \
  --n_layers=4 \
  --n_heads=8 \
  --lr=1e-3 \
  --epochs=10 \
  --batch_size=64 \

Evaluate Models

python eval.py [--FLAGS]

Results

Text-Only Baseline

Emotion	Total	공포	놀람	분노	슬픔	중립	행복	혐오
F1-score	33.95	75.00	33.33	44.44	22.22	18.18	44.44	0.00

Audio-Only Baseline

Emotion	Total	공포	놀람	분노	슬픔	중립	행복	혐오
F1-score	35.28	31.84	42.68	24.71	47.32	35.80	44.52	20.12

Multimodal (Crossmodal) Transformer

Emotion	Total	공포	놀람	분노	슬픔	중립	행복	혐오
F1-score	52.54	44.18	34.44	50.95	81.81	34.28	65.93	56.19

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
images		images
.gitignore		.gitignore
README.md		README.md
dataset.py		dataset.py
eval.py		eval.py
model.py		model.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

korean-audiotext-transformer

Overview

Datasets

Korean Multimodal Sentiment Analysis Dataset

Usage

Prerequisites

Before training,

Train AudioText Transformer

Train Audio-Only Baseline

Train Text-Only Baseline

Evaluate Models

Results

Text-Only Baseline

Audio-Only Baseline

Multimodal (Crossmodal) Transformer

References

About

Languages

youngbin-ro/audiotext-transformer

Folders and files

Latest commit

History

Repository files navigation

korean-audiotext-transformer

Overview

Datasets

Korean Multimodal Sentiment Analysis Dataset

Usage

Prerequisites

Before training,

Train AudioText Transformer

Train Audio-Only Baseline

Train Text-Only Baseline

Evaluate Models

Results

Text-Only Baseline

Audio-Only Baseline

Multimodal (Crossmodal) Transformer

References

About

Topics

Resources

Stars

Watchers

Forks

Languages