Skip to content

Latest commit

 

History

History
27 lines (21 loc) · 1.24 KB

README.md

File metadata and controls

27 lines (21 loc) · 1.24 KB

pix2pix

Description

  • This is a Tensorflow implementaion of Audio source separation (mixture to vocal) using the pix2pix. I pre-processed raw data(mixture and vocal pair dataset) to spectrogram that can be treated as 2-dimensional image, then train the model. See the file hyperparams.py for the detailed hyperparameters.

Requirements

  • NumPy >= 1.11.1
  • TensorFlow >= 1.0.0
  • librosa

Data

I used DSD100 dataset which consists of pairs of mixture audio files and vocal audio files. The complete dataset (~14 GB) can be downloaded here.

File description

  • hyperparams.py includes all hyper parameters that are needed.
  • data.py loads training data and preprocess it into units of raw data sequences.
  • modules.py contains all methods, building blocks and skip connections for networks.
  • networks.py builds networks.
  • train.py is for training.

Training the network

  • STEP 1. Adjust hyper parameters in hyperparams.py if necessary.
  • STEP 2. Download and extract DSD100 data as mentioned above at 'data' directory, and run data.py.
  • STEP 3. Run train.py.

Notes

  • I didn't implement evaluation code yet, but i will update soon.