In this repository, we provide the pipeline to extract a comprehensive set of music-specific features extracted from MIDI files. These features succinctly characterize the musical content, encompassing tempo, chord progression, time signature, instrument presence, genre, and mood. Consecutively we provide the script to generate captions from your own collection of MIDI files.
To directly download the MidiCaps dataset, please visit our huggingface dataset page: .
The below code will help you extract captions from your own collection of MIDI files, as per the framework described in our paper.
git clone https://github.com/AMAAI-Lab/MidiCaps.git
cd MidiCaps
conda create -n midicaps python=3.9
pip install -r requirements.txt
python pipeline.py --config config.cfg
You will need to download some models that we use for genre-mood extraction (indicated in config.cfg), which can be found in the following links:
- genre model and metadata : https://essentia.upf.edu/models/classification-heads/mtg_jamendo_genre/
- mood model and metadata : https://essentia.upf.edu/models/classification-heads/mtg_jamendo_moodtheme/
- emb model : https://essentia.upf.edu/models/music-style-classification/discogs-effnet/
Also, you will need to download FluidR3_GM.sf2 from https://keymusician01.s3.amazonaws.com/FluidR3_GM.zip and replace the .sf2 file location in line 35.
Output of this will be all_files_output.json
. We generate test.json
from this to do in-context learning for claude 3. We provide a sample test.json
and a basic script to run claude 3. Users have to add claude 3 key as environment variable ANTHROPIC_API_KEY
.
export ANTHROPIC_API_KEY=<your claude 3 key>
python caption_claude.py
Please change line 59 in caption_claude.py
to your preferred location.
If you use MidiCaps or code from this repo, please cite our paper:
@article{Melechovsky2024,
author = {Jan Melechovsky and Abhinaba Roy and Dorien Herremans},
title = {MidiCaps: A Large-scale MIDI Dataset with Text Captions},
year = {2024},
journal = {arXiv:2406.02255}
}