This repository contains our code for the 4 labs of the course of Machine Learning for Audio Application @ Politecnico di Milano. The course covers the fundamentals of Machine Learning ad Deep Learning with particular focus on Audio applications.
The course was split into 4 parts. At the end of each one we had a mandatory lab, which consisted in writing a Jupyter notebook in Python.
Part I: Introduction
• Course structure
• Goals of this course
• Overview of signal processing and machine learning in audio and acoustic engineering
• A detailed example: sound recognition
• A bit of history
• Challenges in ML for audio
• Applications
• Development process of smart audio system
Part II: Audio Analysis and Pre-processing
• Data acquisition and annotation.
• Time-frequency transforms: STFT, Perceptual Representations, Constant-Q transforms
• Feature engineering:
• Time-domain features, spectral features, MFCCs, periodicity
• Feature learning: Feature learning vs feature engineering, NMF
• Dimensionality reduction: PCA, feature selection
Part III: Machine Learning and Deep Learning Methods
• Supervised and unsupervised learning
• The learning process, generalization
• Discriminative models
• Linear models (SVMs, Logistic Regression)
• Multiclass models
• Non-linear models (Kernel methods, k-NN, Decision Trees)
• Generative models
• Bayesian, ML, MAP, GMMs, HMMs
• Deep models
• MLP, 1D CNNs, 2D CNNs, RNNs, Hybrid models
Part IV: Performance Analysis
• Datasets: Annotation, data-collection considerations
• Data augmentation: Time stretching, pitch shifting, noise addition
• Evaluation: Cross-validation, performance metrics