Still I haven't run any complete training procedure, but after I do and verify it, the plan is to use this specific classifier for music generation based on genre, which, to the best of my knowledge, still hasn't been done.
If you need assistance running the project or have a question, please email me on [email protected]
The ideal goal of this project is to be able to say "This part of the song has the elements of jazz, progressive rock and a bit of grunge.". This could be possible to achieve defining the problem as multi-output classification.
Deep model is based on [Dec 2016.] Convolutional Recurrent Neural Networks for Music Classification (Keunwoo Choi, George Fazekas, Mark Sandler, Kyunghyun Cho) [1], i.e. using convolutional recurrent neural network deep model for multi-output classification task (tagging each music piece using a subset of labels).
To be able to run all parts of this project, you will need the following additional Python packages (recommended is Python 3.6):
- keras - build and train the high-level model
- librosa - extract mel-spectrograms
- pandas - analyze FMA metadata
- numpy - efficiently work with linear algebra operations
- tensorflow (GPU recommended) - modify keras backend
- matplotlib - plot various graphs and use it extract librosa spectrograms
Mel-spectrograms are extracted from .mp3s and used as model inputs. An example of such a spectrogram is:
However, when generating images for the model, image is generated a bit differently - spectrogram values matrix is dumped into an image in grayscale. Information is preserved this way and there is only one input layer for convolution instead of three. An example of such an image is:
Other spectrograms could also be used as described and compared in detail in [5]. In this work, except mel-spectrograms, raw audio input will also be tested [6].
Using FMA dataset (A Dataset For Music Analysis) [2]. It is a collection of freely available MP3s (under Creative Commons license) most convenient for research projects and (currently) only publicly available music dataset of a kind. Top 16 genres distribution is shown in the following histogram:
-
take a look at and download FMA dataset metadata (342 MiB). For more details, check this repo.
-
Then download small or medium; try with smaller versions first to set things up and then switch to large. I won't use full version as input images then have various sizes and it's anyways to large for my computing resources plus I believe there is more than enough information in 30s trimmed tracks.
-
Run to build, compile and train a keras model (CRNN architecture mentioned above).
Project structure:
- data/
- fma_{size}/
- 000/
- 001/
- fma_metadata/
- genres.csv
- tracks.csv
- fma_{size}/
- in/
- mel-specs/
- 000/
- 001/
- metadata/
- test.csv
- train.csv
- valid.csv
- mel-specs/
- out/
- graphs/
- logs/
- src/
- main.py
- mel-spec.py
- metadata.py
- model.py
- utility.py
I still didn't run the whole training process...
Source code for this project also contains separate folder for CrowdAI competition. Main focus of this project in the next 60 days will be gaining better position on the leaderboard.
[1] CRNN for Music Classification
[2] FMA: A Dataset For Music Analysis
[3] Music Information Retrival (origin of "MIR", Downie)
[4] A Tutorial on Deep Learning for Music Information Retrieval
[5] Comparison on Audio Signal Preprocessing Methods for Deep Neural Networks on Music Tagging
[6] End-to-end learning for music audio tagging at scale (1D convolution)
For broader references on music information retrieval, check https://github.com/ybayle/awesome-deep-learning-music.