Training a LSTM on Indonesian Folk Songs in MIDI format to compose a new MIDI music.
Have a listen:
Have a look:
- numpy
- pandas
- pytorch==0.4.1
- plac
- tqdm
Convert your midi file into .csv using Midicsv[1], and put them in a folder, by default, in the dataset
folder. It is recommended to remove channels that contain repetitive music (usually background sound such as drums and snare) to avoid the RNN produce uninteresting repetitive sound. I do perform this data cleaning in the Indonesian Folk Song Dataset.
The training is executed through a command-line interface (CLI). Check for the cli help documentation.
python train.py -h
You may also use the default value by simply run
python train.py
You can visualize the model performance using the Music Composer.ipynb
notebook while training.
Note: The program will keep running unless you interrupt it with ctrl + c
.
n_hidden
/-nh
Number of hidden unit.n_layers
/-nl
Number of hidden layer.bs
/-bs
batch sizeseq_len
/-sl
Length of input sequence.lr
/-lr
Learning Rated_out
/-do
Dropout ratesave_every
/-se
Number of steps for a model to be savedprint_every
/-pe
Number of steps that the training information (loss, etc.) will be printedname
/-o
Folder Name for the model. It will create a new folder with this name if the folder is not found.midi_source_folder
/-i
Folder Name for the data. It must have the .csv files in Midicsv[1] format.
Use the Music Composer.ipynb
notebook. Load the model, then set your desired configuration.
I have prepared some generated music in the sample folder. Use Midicsv[1] to convert it back to midi file, then you can open it with common midi player or you can try MidiEditor[2]
fname
The name used for the generated music (.csv). You need to convert it back to .mid using Midicsv[1]prime
Prime for the RNN to compose the characterstop_k
Take top k most probable prediction to randomly choose from.top_k = 1
means that we always use the most probable character. Highertop_k
will produce more creative music (relative to the dataset). I would recommend around 3-5. If top_k value is too large, the prediction may not follow the desired format to be converted back to .mid format.compose_len
Length of character to compose. One music note will need 8-14 characters.channel
The midi channels and track number. For example,[0, 1, 2]
means three channels, with each Track 0, 1 and 2.
- If
Retry music composing...
keeps on popping
It is caused by our model does not follow the format. For example, we would want C5-512-1024, but the model generated C5--512-1024. In analogy with char-RNN for paragraph generation, it is like a typo.
You can try to use less channel, decreasetop_k
, decreasecompose_len
, train longer, or get more data. Lesstop_k
helps because it will follow the proper format of the data instead of randomly generate characters. The same with longer training, and more data so that it can properly learn the format. Lowercompose_len
, instead, just to avoid this problem before it happens. Less channel is a must, the more you try to generate, the more chances that the model broke the format. - If the model replicates the music from dataset
It is overfitting. You can try to decrease the model complexity (lessn_hidden
,n_layers
,seq_len
), choose a model with lower epoch (higher loss model), or increase thed_out
. - If the generated music sounds gibberish
Your data may be too complex. Try a more homogenous data.
Have a listen:
- https://github.com/WiraDKP/RNN_MIDI_Composer/blob/master/sample/mymusic.mid
I used a list of Indonesian folk songs in midi format[3]. After some preprocessing, it results in all those .csv files in thedataset
folder. The trained model is in themodel
folder, and the music it generates is in thesample
folder.
Note: You does not have to push the Loss to minimum to generate a good music.
This project will not succeed without these references. Thank you indeed!