-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Insertion of numbers in transcriptions #7
Comments
Em... possible. will see if there is any bug in the alignment tools. |
They are all over the place, i think this is a page numbers or something like this, sometimes it is even roman numbers, sometimes it is in brackets, sometimes it is just as is. |
Emm,We filter the segments according to the levenshtien distance between original text and transcript text, when the segment is long, this kind of insertions may not affect the whole distance, I mean the distance is sitll below the given threshold. Currently, have not figured out how to fix this bug. |
In the kaldi format transcription files, it seems that sometimes there are insertions of numbers that are not present in the audio. For example:
Above,
008 028
and008 029
seem to be Bible verse numbers that are in the original text but not actually read. I have verified this by listening to the audio sample.The text was updated successfully, but these errors were encountered: