Skip to content

Latest commit

 

History

History
95 lines (71 loc) · 3.34 KB

README.md

File metadata and controls

95 lines (71 loc) · 3.34 KB

Multi-Teacher Language-Aware Knowledge Distillation for Speech Emotion Recognition

Hugging Face PyTorch

Run on Colab

Installation

1. Clone the Repository

git clone https://github.com/aalto-speech/mtkd4ser.git
cd mtkd4ser

2. Create the Environment

conda env create -f environment.yml

3. Activate the Environment

conda activate ser_venv

Usage

1. Multi-Teacher Language-Aware Knowledge Distillation for English Speech Emotion Recognition Using the Monolingual Setup

python main.py --LEARNING_RATE 3e-5 --BATCH_SIZE 16 --N_EPOCHS 20 --SESSION 5 --TRAINING 1 --PARADIGM "MTKD" --LANGUAGE "EN" --LINGUALITY "Monolingual"

2. Conventional Knowledge Distillation for Finnish Speech Emotion Recognition Using the Multilingual Setup

python main.py --LEARNING_RATE 3e-5 --BATCH_SIZE 16 --N_EPOCHS 20 --SESSION 9 --TRAINING 1 --PARADIGM "KD" --LANGUAGE "FI" --LINGUALITY "Multilingual"

3. Vanilla Fine-Tuning for French Speech Emotion Recognition Using the Multilingual Setup

python main.py --LEARNING_RATE 3e-5 --BATCH_SIZE 16 --N_EPOCHS 20 --SESSION 1 --TRAINING 1 --PARADIGM "FT" --LANGUAGE "FR" --LINGUALITY "Multilingual"

4. Available Configurations and Choices

It supports a range of configurable parameters for training, validation, and evaluation. The table below details each Configuration and its options. Select the options that fit your use case.

Configuration Options
LINGUALITY Monolingual or Multilingual
LANGUAGE EN or FI or FR
PARADIGM MTKD or KD or FT
TRAINING 1 or 0
SESSION EN: 1-5 or FI: 1-9 or FR: 1
N_EPOCHS ℤ⁺
BATCH_SIZE ℤ⁺
LEARNING_RATE ℝ⁺

Contributing

  • MTKD-based monolingual SER methods for English, Finnish, and French.
    • Adapt the method for a new language (e.g., Chinese).
  • MTKD-based multilingual SER method for English, Finnish, and French.
    • Extend the multilingual method to include a resource-scarce language (e.g., Bangla).
  • Incorporate heterogeneous Large Audio-Language Models in the MTKD method.
    • Distill the internal knowledge of heterogeneous models to the student.

Citation