-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add doc about FST-based CTC forced alignment. #1482
Add doc about FST-based CTC forced alignment. #1482
Conversation
Hi, The align tool can make the word time stamp is accurate on the begin and end postion ? |
It depends on what model you use. You can have a look at https://pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html |
@csukuangfj Will it be completed soon? |
Yes. I am working on it now. |
@csukuangfj Has k2-based approach been forgot? |
No, it is |
@csukuangfj Sorry to bother you, and look forward to your reply. Recently I compared several different alignment methods, such as TorchAudio(ctc), whisperX and funasr, and found that none of them were as good as Kaldi-based alignment. The conclusion is consistent with this paper "https://www.isca-archive.org/interspeech_2024/rousso24_interspeech.pdf", do you have some advice or todo in alignment in k2-fas project. |
Kaldi's TDNN systems have limited context, which may give better alignment. (Or GMMs may give even more precise alignments as they have even less context). I'm not sure that alignment is a super big priority in k2-fsa right now, is there a specific type of application you have in mind? |
Thanks to Daniel for solving a problem I have had for a long time (less context is more important for alignment?). Yes. I have been working in spoken pronunciation scoring, which relies heavily on the accuracy of phoneme or word alignment. |
It is based on CTC FORCED ALIGNMENT API TUTORIAL from torchaudio, but we are using
an FST-based approach.
I can produce identical output with torchaudio using https://github.com/k2-fsa/kaldi-decoder.
I am refactoring the code and will prepare at least two colab notebooks.