Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition
Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition
DPSL-ASR is a novel method for end-to-end noise-robust speech recognition. It extends our prior work IFF-Net (Interactive Feature Fusion Network) with dual-path inputs and style learning, which achieves better ASR performance on Robust Automatic Transcription of Speech (RATS) and CHiME-4 datasets.
Left figure: (a) joint SE-ASR approach, (b) IFF-Net baseline, (c) our proposed DPSL-ASR approach.
Right figure: back-end ASR module with style learning and consistency loss in our DPSL-ASR. The dashed arrows denote sharing parameters.
If you find DPSL-ASR or IFF-Net useful in your research, please kindly use the following BibTeX entry for citation:
@inproceedings{hu2023dual,
title={Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition},
author={Hu, Yuchen and Hou, Nana and Chen, Chen and Chng, Eng Siong},
booktitle={INTERSPEECH},
year={2023}
}
@inproceedings{hu2022interactive,
title={Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition},
author={Hu, Yuchen and Hou, Nana and Chen, Chen and Chng, Eng Siong},
booktitle={ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={6292--6296},
year={2022},
organization={IEEE}
}
Our code implementation is based on ESPnet (v.0.9.6), please kindly use the following commands for installation.
git clone https://github.com/YUCHEN005/DPSL-ASR.git
cd DPSL-ASR
pip install -e .
Experiment directory is at egs2/rats_chA/asr_with_enhancement/
, and the network code is at espnet2/asr/dpsl_asr.py
.