A PyTorch implementation of the musicnn model by Jordi Pons [1], a CNN-based audio feature extractor and tagger. This implementation is still a WIP and does not strictly follow the musicnn architecture.
git clone https://github.com/ilaria-manco/music-audio-tagging-pytorch
Create a virtual environment and activate it
python3 -m venv env
source venv/bin/activate
Install the required dependencies
pip install -r requirements.txt
If you want to retrain the model on the MTT dataset, you'll have to download this first from here. After doing this, change config_file.py
to point to the correct paths where the data is stored and then preprocess the audio files by running
python run_preprocessing.py --mtt
Then you can use the following script for training, after changing the parameters in config_file.py
, if necessary.
python run_training.py
The evaluation script computes two metrics, mean ROC AUC and mean PR AUC and produces a plot of the two metrics over the TFR vs FPR. The evaluation is done on 5328 test data samples from the MTT.
python evaluate.py --model_number
This repo also contains 3 pre-trained models ready to use.
You can extract the output tags by running
python extract_features.py --input_audio --output_path --model_number
For model_number, 2 is the one found to perform better in the preliminary evaluation and is therefore recommended.
python extract_features.py --input_audio --num_samples
[1] Pons, Jordi, and Xavier Serra. "musicnn: Pre-trained convolutional neural networks for music audio tagging." arXiv preprint arXiv:1909.06654 (2019).