Mel-Cepstral Distance

Python library to compute the Mel-Cepstral Distance (also called Mel-Cepstral Distortion) of two audio signals based on Mel-Cepstral Distance Measure for Objective Speech Quality Assessment by Robert F. Kubichek.

Note on the new version [2024/12/05]

The current code repository represents a complete refactoring of the previous codebase, aiming to enhance clarity and alignment with the methodologies described in the original paper.

Key changes include:

Removal of CLI: The command-line interface has been eliminated to streamline the functionality and focus on core features.
Improved Calculation: The computation now adheres more closely to the approach outlined in the original research.
Pause Removal: Functionality for handling pauses has been introduced.
Enhanced Literature Review: A thorough review of relevant literature has been conducted to refine the default parameter values. However, not all necessary details are provided in the referenced papers, which may require further interpretation.
Reduced Dependencies: Non-essential dependencies have been removed, including librosa, resulting in a more lightweight and focused package.

For the time being, it is recommended to clone the repository and use pip install . for installation rather than relying on the PyPI version. Further updates to the codebase are planned for an upcoming version.

Test coverage

---------- coverage: platform linux, python 3.8.20-final-0 -----------
Name                                       Stmts   Miss  Cover   Missing
------------------------------------------------------------------------
src/mel_cepstral_distance/__init__.py          1      0   100%
src/mel_cepstral_distance/alignment.py        70      0   100%
src/mel_cepstral_distance/api.py             365      0   100%
src/mel_cepstral_distance/computation.py      68      0   100%
src/mel_cepstral_distance/helper.py           38      0   100%
src/mel_cepstral_distance/silence.py          56      0   100%
------------------------------------------------------------------------
TOTAL                                        598      0   100%

Installation

pip install mel-cepstral-distance --user

Example usage

Compare two audio files with default parameters

from mel_cepstral_distance import compare_audio_files

mcd, penalty = compare_audio_files(
  'examples/GT.wav',
  'examples/WaveGlow.wav',
)

print(f'MCD: {mcd:.2f}, Penalty: {penalty:.4f}')
# MCD: 4.03, Penalty: 0.0197

Calculation

Spectrogram

$$ X(k, m) = \text{FFT of } x_k(n), \text{ for real input.} $$

Where:

$X(k, m)$: The result (amplitude spectrogram) of the real-valued FFT for the $k$-th frame at frequency index $m$.
$x_k(n)$: The time-domain signal of the $k$-th frame.
$\text{FFT}$: The real-valued discrete Fourier transform, computed using np.fft.rfft.

Mel spectrogram

$$ X_{k,n} = \log_{10}\left\lbrace\sum_m^M |X(k, m)|^2 \cdot w_n(m)\right\rbrace $$

Where:

$X_{k,n}$: The logarithmic Mel-scaled power spectrogram for the $k$-th frame at Mel frequency $n$.
$X(k, m)$: The amplitude spectrum of the $k$-th frame at frequency $m$.
$M$: The total number of Mel frequency bins.
$w_n(m)$: The Mel filter bank weights for Mel frequency $n$ and frequency bin $m$.

Mel-frequency cepstral coefficients

$$ MC_X(i, k) = \sum_{n=1}^{M} X_{k,n} \cos\left[i\left(n - \frac{1}{2}\right)\frac{\pi}{M}\right] $$

Where:

$MC_X(i, k)$: The $i$-th Mel-frequency cepstral coefficient (MFCC) for the $k$-th frame.
$X_{k,n}$: The logarithmic Mel-scaled power spectrogram for the $k$-th frame at Mel frequency $n$.
$M$: The total number of Mel frequency bins.
$i$: The index of the MFCC being computed.