Skip to content

vaniastankov/Master-Thesis-code

Repository files navigation

Natural Audio Data Augmentation Techniques

Type: Master's Thesis

Author: Ivan Stankov

1st Examiner: Prof. Dr. Stefan Lessmann

2nd Examiner: Prof. Dr. Benjamin Fabian

results

Table of Content

Summary

In various machine learning and data science projects, audio data stands aside as a separate yet understudied domain. Quite often, one might come across a study that treats audio as image data without taking its specificity into consideration. As it is hard to label audio recordings, various data augmentation techniques are crucial in this domain, just like in others. However, research on this topic is limited. In this work, multiple, supposedly most beneficial, audio data techniques are compared with each other and with a few novel ones, which consider the nature of an audio signal.

Keywords: Data Augmentation, Audio Data, Machine Learning, Audio Classification, Model Training.

Full text: The full text for this work is available here.

Working with the repo

Dependencies

The project was built using Python 3.9.6. Nonetheless, there should be little to no problem in reproducing it with a different version.

Code dependencies are stated in requirements.txt. All imports are managed in setup.py.

Setup

  1. Clone this repository
  2. Install requirements
pip install --upgrade pip
pip install -r requirements.txt

(In some cases pip has to be substituted for pip3 or conda).

  1. Run code in desired Jupyter Notebook

Reproducing results

In order to reproduce the results you should do the following:

  1. Upon cloning this repository, you should create a folder called Data. In it, you should place a copy of the dataset used in this work.

  2. After that, you can start experimenting with augmentation techniques and visualizations used in this work

  3. In order to train a model, you can run Models Notebook. Note that in this work it was executed in Google Colab. In order to reuse it on your account, a copy of the preprocessed dataset (which is built after running PreprocessingNotebook) has to be uploaded to your Google Drive and made accessible to the Notebook execution environment. Execution on alternative platforms and/or local machines should follow similar steps.

  4. As a result of executing Models Notebook, you will obtain a pickled dataframe (for example, like this) that stores true and predicted labels for each augmentation technique. The datafame looks as follows:

slice_file_name fold class source dist mixup imixup room spectrum warp delay all
0 57320-0-0-39.wav 1 air_conditioner o children_playing air_conditioner air_conditioner air_conditioner children_playing children_playing dog_bark air_conditioner
1 134717-0-0-6.wav 1 air_conditioner o street_music air_conditioner air_conditioner air_conditioner air_conditioner engine_idling air_conditioner engine_idling
2 57320-0-0-31.wav 1 air_conditioner o dog_bark air_conditioner air_conditioner dog_bark dog_bark dog_bark dog_bark air_conditioner
  1. Final metrics can be recalculated using Evaluation Notebook.

Training code

The training code is provided as part of the Models Notebook.

Evaluation code

Evaluation Notebook is responsible for metrics and resulting plots.

Pretrained models

Models used in this work can benefit from fine-tuning pre-trained weights or built from scratch. In either case, there is no need to separately load weights, these can be downloaded automatically by a library used in this work. For more details, see:TensorFlow Docs.

Results

Proper evaluation of the prediction result is provided in thesis itself. However, Evaluation Notebook contains excessive data that you can draw your conclusion from.

Project structure

The project has a following structure:

├── Box Plot for EfficientNetV2B1 Accuracies.png    -- Example of the results
├── EfNetV2B1Res.pkl                                -- EfficientNetV2B1 predictions
├── EfNetV2B2Res.pkl                                -- EfficientNetV2B2 predictions
├── Examples                                     -- Real audio data examples subfolder
│   ├── Hall Recording Example.mp3                  
│   ├── New Recording.mp3                           
│   ├── Original Recording Example.mp3              
│   └── Street Recording Example.mp3               
├── README.md                                    
├── imgs                                         -- Subfolder storing high res plots
├── Data                                         -- subfolder storing data
│   ├── pkl                                      -- Subfolder with preprocessed data copies
│   └── UrbanSound8K                             -- Subfolder with original data
├── requirements.txt                                -- Library requirements
└── src                                          -- Code folder
    ├── Evaluation.ipynb                            -- Evaluation of model predictions
    ├── Models.ipynb                                -- Model training notebook
    ├── Motivation.ipynb                            -- Notebook explaining the domain
    ├── Preprocessing.ipynb                         -- Data preprocessing Notebook
    └── py                                       -- Subfolder for supporting .py files
        ├── augmenters.py                           -- Augmentation Techniques
        ├── batchproc.py                            -- Functions for Batch processing
        ├── helpers.py                              -- Plots and supporting functions
        └── setup.py                                -- Imports and global variable definitions

Note that Data and imgs folders are excluded from the repo due to their size.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published