A simple PyTorch reimplementation of library python_speech_features.
- Great for Interpretability experiments - All audio processing operations can be performed and the results can be backpropagated to the original signal tensor.
- Supports Hybrid Model Design - Parametric operations at different stages of audio processing.
Install from PyPI
pip install pytorch-speech-features
Install from GitHub
git clone https://github.com/Debjoy10/pytorch_speech_features
python setup.py develop
Functions same as python_speech_features (Refer to its documentation here).
Instead of input signal as list / numpy array, pass tensor (both 'cpu' and 'cuda' supported!!).
See example use given above.
Supported features:
- Mel Frequency Cepstral Coefficients
- Filterbank Energies
- Log Filterbank Energies
- Spectral Subband Centroids
Two things to test for pytorch_speech_features operations -
- Similarity to python_speech_features outputs.
- Gradient correctness via Autograd Gradcheck.
@misc{https://doi.org/10.5281/zenodo.8021586,
doi = {10.5281/ZENODO.8021586},
url = {https://zenodo.org/record/8021586},
author = {{Debjoy Saha}},
title = {Debjoy10/pytorch_speech_features: Release v0.0.1},
publisher = {Zenodo},
year = {2023},
copyright = {Open Access}
}