Leveraging LLM Embeddings for Cross-Dataset Label Alignment and Zero-Shot Music Emotion Prediction

This repository contains the implementation of our novel approach to music emotion recognition across multiple datasets using Large Language Model (LLM) embeddings. Our method involves:

Computing LLM embeddings for emotion labels
Clustering similar labels across datasets
Mapping music features (MERT) to the LLM embedding space
Introducing alignment regularization to improve cluster dissociation and enhance generalization to unseen datasets for zero-shot classification

Installation

Clone the repository:

git clone https://github.com/AMAAI-Lab/cross-dataset-emotion-alignment.git
cd cross-dataset-emotion-alignment

Set up a Conda environment:

conda env create -f environment.yaml
conda activate cross-dataset-emotion-alignment

Data Preparation

Option A: Download the MTG-Jamendo, CAL500, and Emotify datasets and place them in the data/ directory.

Option B: Use the preprocessed data from Hugging Face.
If using Option A, preprocess the data and compute MERT features:
```
python src/data/preprocess.py --audio_duration 10
```

Training

Configure your training with various options:

Baseline dataset training:

# Single dataset with label augmentation
python src/train.py data.combine_train_datasets=["mtg"] version=your_version +experiment=label_aug

# Multi-dataset without label augmentation
python src/train.py data.combine_train_datasets=["mtg", "CAL500"] version=your_version

Label clustering (includes label augmentation by default):

python src/train.py +cluster=mtg_CAL500 data.train_combine_mode=max_size_cycle

Alignment regularization:

python src/train.py +cluster=reg_mtg_CAL500 model.regularization_alpha=2.5 data.train_combine_mode=max_size_cycle

Available options:

Datasets: mtg, CAL500, emotify
Clusters: mtg_CAL500, mtg_emotify, CAL500_emotify
Regularizations: reg_mtg_CAL500, reg_mtg_emotify, reg_CAL500_emotify

Evaluation

To evaluate a trained model, replace train.py with eval.py in the command and specify the checkpoint path. You can also use a wandb artifact to load the model by uncommenting the wandb-related configs in conf/eval.yaml and providing your wandb project, entity, and artifact name. By default, evaluation is performed on all three datasets. To evaluate on other datasets, modify data/multi.yaml.

Example commands:

Evaluate on a single dataset:

python src/eval.py data.combine_train_datasets=["mtg"] ckpt_path=path/to/checkpoint.ckpt

Evaluate with clustering:

python src/eval.py +cluster=mtg_CAL500 data.train_combine_mode=max_size_cycle ckpt_path=path/to/checkpoint.ckpt

Evaluate with alignment regularization:

python src/eval.py +cluster=reg_mtg_CAL500 model.regularization_alpha=2.5 data.train_combine_mode=max_size_cycle ckpt_path=path/to/checkpoint.ckpt

Results

We present our results for both segment-level and song-level predictions. Song-level results are obtained through majority voting of segment-level predictions.

Segment-Level Results

Phases	Train-MTG+CAL, Test-EMO	Train-CAL+EMO, Test-MTG	Train-EMO+MTG, Test-CAL
Baseline 1	0.324	0.0215	0.255
Baseline 2 (Clustering)	0.341	0.0240	0.262
Alignment Regularisation	0.402 (λ = 2.5)	0.0248 (λ = 2.5)	0.262 (λ = 1)

Song-Level Results (Majority Voting)

Phases	Train-MTG+CAL, Test-EMO	Train-CAL+EMO, Test-MTG	Train-EMO+MTG, Test-CAL
Baseline 1	0.315	0.0129	0.252
Baseline 2 (Clustering)	0.346	0.0175	0.267
Alignment Regularisation	0.400 (λ = 2.5)	0.0202 (λ = 2.5)	0.229 (λ = 1)

These results demonstrate the effectiveness of our approach, particularly the alignment regularization technique, in improving cross-dataset generalization for music emotion recognition tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
clusters		clusters
conf		conf
resources		resources
scripts		scripts
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project-root		.project-root
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Leveraging LLM Embeddings for Cross-Dataset Label Alignment and Zero-Shot Music Emotion Prediction

Table of Contents

Installation

Data Preparation

Training

Evaluation

Results

Segment-Level Results

Song-Level Results (Majority Voting)

About

Releases

Packages

Contributors 2

Languages

License

AMAAI-Lab/cross-dataset-emotion-alignment

Folders and files

Latest commit

History

Repository files navigation

Leveraging LLM Embeddings for Cross-Dataset Label Alignment and Zero-Shot Music Emotion Prediction

Table of Contents

Installation

Data Preparation

Training

Evaluation

Results

Segment-Level Results

Song-Level Results (Majority Voting)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages