Skip to content

Multi-Teacher Language-Aware Knowledge Distillation for Speech Emotion Recognition

License

Notifications You must be signed in to change notification settings

aalto-speech/mtkd4ser

Repository files navigation

Multi-Teacher Language-Aware Knowledge Distillation for Speech Emotion Recognition

Hugging Face PyTorch

Run on Colab

Installation

1. Clone the Repository

git clone https://github.com/aalto-speech/mtkd4ser.git
cd mtkd4ser

2. Create the Environment

conda env create -f environment.yml

3. Activate the Environment

conda activate ser_venv

Usage

1. Multi-Teacher Language-Aware Knowledge Distillation for English Speech Emotion Recognition Using the Monolingual Setup

python main.py --LEARNING_RATE 3e-5 --BATCH_SIZE 16 --N_EPOCHS 20 --SESSION 5 --TRAINING 1 --PARADIGM "MTKD" --LANGUAGE "EN" --LINGUALITY "Monolingual"

2. Conventional Knowledge Distillation for Finnish Speech Emotion Recognition Using the Multilingual Setup

python main.py --LEARNING_RATE 3e-5 --BATCH_SIZE 16 --N_EPOCHS 20 --SESSION 9 --TRAINING 1 --PARADIGM "KD" --LANGUAGE "FI" --LINGUALITY "Multilingual"

3. Vanilla Fine-Tuning for French Speech Emotion Recognition Using the Multilingual Setup

python main.py --LEARNING_RATE 3e-5 --BATCH_SIZE 16 --N_EPOCHS 20 --SESSION 1 --TRAINING 1 --PARADIGM "FT" --LANGUAGE "FR" --LINGUALITY "Multilingual"

4. Available Configurations and Choices

It supports a range of configurable parameters for training, validation, and evaluation. The table below details each Configuration and its options. Select the options that fit your use case.

Configuration Options
LINGUALITY Monolingual or Multilingual
LANGUAGE EN or FI or FR
PARADIGM MTKD or KD or FT
TRAINING 1 or 0
SESSION EN: 1-5 or FI: 1-9 or FR: 1
N_EPOCHS ℤ⁺
BATCH_SIZE ℤ⁺
LEARNING_RATE ℝ⁺

Contributing

  • MTKD-based monolingual SER methods for English, Finnish, and French.
    • Adapt the method for a new language (e.g., Chinese).
  • MTKD-based multilingual SER method for English, Finnish, and French.
    • Extend the multilingual method to include a resource-scarce language (e.g., Bangla).
  • Incorporate heterogeneous Large Audio-Language Models in the MTKD method.
    • Distill the internal knowledge of heterogeneous models to the student.

Citation

About

Multi-Teacher Language-Aware Knowledge Distillation for Speech Emotion Recognition

Resources

License

Stars

Watchers

Forks