Natural Audio Data Augmentation Techniques

Type: Master's Thesis

Author: Ivan Stankov

1st Examiner: Prof. Dr. Stefan Lessmann

2nd Examiner: Prof. Dr. Benjamin Fabian

Table of Content

Summary
Working with the repo
- Dependencies
- Setup
Reproducing results
Results
Project structure

Summary

In various machine learning and data science projects, audio data stands aside as a separate yet understudied domain. Quite often, one might come across a study that treats audio as image data without taking its specificity into consideration. As it is hard to label audio recordings, various data augmentation techniques are crucial in this domain, just like in others. However, research on this topic is limited. In this work, multiple, supposedly most beneficial, audio data techniques are compared with each other and with a few novel ones, which consider the nature of an audio signal.

Keywords: Data Augmentation, Audio Data, Machine Learning, Audio Classification, Model Training.

Full text: The full text for this work is available here.

Working with the repo

Dependencies

The project was built using Python 3.9.6. Nonetheless, there should be little to no problem in reproducing it with a different version.

Code dependencies are stated in requirements.txt. All imports are managed in setup.py.

Setup

Clone this repository
Install requirements

pip install --upgrade pip
pip install -r requirements.txt

(In some cases pip has to be substituted for pip3 or conda).

Run code in desired Jupyter Notebook

Reproducing results

In order to reproduce the results you should do the following:

Upon cloning this repository, you should create a folder called Data. In it, you should place a copy of the dataset used in this work.
After that, you can start experimenting with augmentation techniques and visualizations used in this work
In order to train a model, you can run Models Notebook. Note that in this work it was executed in Google Colab. In order to reuse it on your account, a copy of the preprocessed dataset (which is built after running PreprocessingNotebook) has to be uploaded to your Google Drive and made accessible to the Notebook execution environment. Execution on alternative platforms and/or local machines should follow similar steps.
As a result of executing Models Notebook, you will obtain a pickled dataframe (for example, like this) that stores true and predicted labels for each augmentation technique. The datafame looks as follows:

	slice_file_name	fold	class	source	dist	mixup	imixup	room	spectrum	warp	delay	all
0	57320-0-0-39.wav	1	air_conditioner	o	children_playing	air_conditioner	air_conditioner	air_conditioner	children_playing	children_playing	dog_bark	air_conditioner
1	134717-0-0-6.wav	1	air_conditioner	o	street_music	air_conditioner	air_conditioner	air_conditioner	air_conditioner	engine_idling	air_conditioner	engine_idling
2	57320-0-0-31.wav	1	air_conditioner	o	dog_bark	air_conditioner	air_conditioner	dog_bark	dog_bark	dog_bark	dog_bark	air_conditioner

Final metrics can be recalculated using Evaluation Notebook.

Training code

The training code is provided as part of the Models Notebook.

Evaluation code

Evaluation Notebook is responsible for metrics and resulting plots.

Pretrained models

Models used in this work can benefit from fine-tuning pre-trained weights or built from scratch. In either case, there is no need to separately load weights, these can be downloaded automatically by a library used in this work. For more details, see:TensorFlow Docs.

Results

Proper evaluation of the prediction result is provided in thesis itself. However, Evaluation Notebook contains excessive data that you can draw your conclusion from.

Project structure

The project has a following structure:

├── Box Plot for EfficientNetV2B1 Accuracies.png    -- Example of the results
├── EfNetV2B1Res.pkl                                -- EfficientNetV2B1 predictions
├── EfNetV2B2Res.pkl                                -- EfficientNetV2B2 predictions
├── Examples                                     -- Real audio data examples subfolder
│   ├── Hall Recording Example.mp3                  
│   ├── New Recording.mp3                           
│   ├── Original Recording Example.mp3              
│   └── Street Recording Example.mp3               
├── README.md                                    
├── imgs                                         -- Subfolder storing high res plots
├── Data                                         -- subfolder storing data
│   ├── pkl                                      -- Subfolder with preprocessed data copies
│   └── UrbanSound8K                             -- Subfolder with original data
├── requirements.txt                                -- Library requirements
└── src                                          -- Code folder
    ├── Evaluation.ipynb                            -- Evaluation of model predictions
    ├── Models.ipynb                                -- Model training notebook
    ├── Motivation.ipynb                            -- Notebook explaining the domain
    ├── Preprocessing.ipynb                         -- Data preprocessing Notebook
    └── py                                       -- Subfolder for supporting .py files
        ├── augmenters.py                           -- Augmentation Techniques
        ├── batchproc.py                            -- Functions for Batch processing
        ├── helpers.py                              -- Plots and supporting functions
        └── setup.py                                -- Imports and global variable definitions

Note that Data and imgs folders are excluded from the repo due to their size.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Natural Audio Data Augmentation Techniques

Table of Content

Summary

Working with the repo

Dependencies

Setup

Reproducing results

Training code

Evaluation code

Pretrained models

Results

Project structure

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Examples		Examples
src		src
.gitignore		.gitignore
Box Plot for EfficientNetV2B1 Accuracies.png		Box Plot for EfficientNetV2B1 Accuracies.png
EfNetV2B1Res.pkl		EfNetV2B1Res.pkl
EfNetV2B2Res.pkl		EfNetV2B2Res.pkl
README.md		README.md
ResNet50Res.pkl		ResNet50Res.pkl
requirements.txt		requirements.txt

vaniastankov/Master-Thesis-code

Folders and files

Latest commit

History

Repository files navigation

Natural Audio Data Augmentation Techniques

Table of Content

Summary

Working with the repo

Dependencies

Setup

Reproducing results

Training code

Evaluation code

Pretrained models

Results

Project structure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages