Skip to content

Code and data to reproduce the experiments presented in the article "Harnessing High-Level Song Descriptors towards Natural Language-Based Music Recommendation" (NLP4MusA2024)

Notifications You must be signed in to change notification settings

deezer/nlp4musa_melscribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nlp4musa_melscribe

This repository provides Python code to reproduce the experiments from the article Harnessing High-Level Song Descriptors towards Natural Language-Based Music Recommendation, accepted for publication to NLP4MusA 2024.

For a summary of this project, please consult the poster or slides.

Setup

git clone https://github.com/deezer/nlp4musa_melscribe.git
cd nlp4musa_melscribe

Install the requirements:

pip install -r requirements.txt

LP-MusicCaps datasets are available for download on Hugging Face (MC, MTT, MSD). Each of these datasets should be read and exported to csv files for each split as we show below for LP-MusicCaps-MTT:

from datasets import load_dataset

ds = load_dataset("seungheondoh/LP-MusicCaps-MTT")
ds['test'].to_csv('data/LP-MusicCaps-MTT/test.csv')
ds['train'].to_csv('data/LP-MusicCaps-MTT/train.csv')
ds['valid'].to_csv('data/LP-MusicCaps-MTT/valid.csv')

LP-MusicCaps-MSD is a gated dataset so you must be authenticated to access it.

Download the fine-tuned models (the cross-encoder and the bi-encoder) from Zenodo:

wget https://zenodo.org/records/14289764/files/models.zip
unzip models.zip -d models/

Reproduce paper results

Evaluate our model

python src/eval_our_model.py --output_path results/results_our_model.json --sources  lpms-mtt lpms-msd lpms-mc lpms-mc-rephrased --input_path data/ --our_model_path models/bi-encoder-lpmusicaps-msmarco-bert-base-dot-v5/

Evaluate text encoder baselines

python src/eval_text_encoders.py --output_path results/results_text_encoders.json --sources  lpms-mtt lpms-msd lpms-mc lpms-mc-rephrased --input_path data/

Evaluate text encoder baselines from multimodal models

Set up the baselines (depending on the environment, pip install -e src/music-text-representation/ throws an exception regarding the package sklearn; as suggested, this should be replaced with scikit-learn in the file setup.py):

wget https://huggingface.co/lukewys/laion_clap/resolve/main/music_audioset_epoch_15_esc_90.14.pt -P src/laion-clap/
git clone https://github.com/seungheondoh/music-text-representation.git src/music-text-representation/
pip install -e src/music-text-representation/
wget https://zenodo.org/record/7322135/files/mtr.tar.gz -P src/music-text-representation/
tar -zxvf src/music-text-representation/mtr.tar.gz -C src/music-text-representation/

Run the evaluation script:

python src/eval_text_encoder_multimodal_models.py --output_path results/results_text_encoders_multimodal.json --sources  lpms-mtt lpms-msd lpms-mc lpms-mc-rephrased --input_path data/ --ttmr_model_path src/music-text-representation/mtr/ --clap_model_path src/laion-clap/music_audioset_epoch_15_esc_90.14.pt

Fine-tune a model from scratch

Generate training data:

python src/generate_train_data.py --input_path data/ --output_path data/training_gpl --sources lpms-mtt lpms-msd --random_seed=42 --docs_per_query=3

Train the model with the Generative Pseudo-labeling method (GPL):

python -m  gpl.train  --path_to_generated_data "data/training_gpl"    --base_ckpt "msmarco-bert-base-dot-v5"     --gpl_score_function "cos_sim"     --batch_size_gpl 4   --gpl_steps 140000   --output_dir "models/nlp4musa_seed42"    --retrievers "msmarco-distilbert-base-v3" "msmarco-MiniLM-L-6-v3"     --retriever_score_functions "cos_sim"  --negatives_per_query 30  --cross_encoder "models/cross-encoder-musiccaps-ms-marco-MiniLM-L-6-v2/"    --qgen_prefix "qgen" --max_seq_length 512

As described in the paper, we fine-tuned a domain-specific cross-encoder using human-annotated data from the MusicCaps dataset, specifically the model models/cross-encoder-musiccaps-ms-marco-MiniLM-L-6-v2. The cross-encoder predicts a similarity score between a music-related longer text (e.g., song descriptions or user requests) and a music descriptor (e.g., tags). This model serves as a teacher to generate soft labels for the training data, which are then used to train the bi-encoder.

Paper

Please cite our paper if you use this data or code in your work:

@InProceedings{Epure2024Harnessing,
 	title={Harnessing High-Level Song Descriptors towards Natural Language-Based Music Recommendation},
  	author={Epure, Elena V. and Meseguer-Brocal, Gabriel and Afchar, Darius and Hennequin, Romain},
  	booktitle={Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA2024)},
  	month={November},
  	year={2024},
  	publisher = {Association for Computational Linguistics},
}

About

Code and data to reproduce the experiments presented in the article "Harnessing High-Level Song Descriptors towards Natural Language-Based Music Recommendation" (NLP4MusA2024)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages