-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CTC training #3
Changes from 1 commit
71c4e29
d146a4e
40eed74
0b19aa0
f25eedf
e005ea0
d5e0408
8a72901
a01d08f
f3542c7
2e33e24
ee83a3e
5443618
a909592
00f8371
6f9fe5b
4a66712
8055bf3
78bb65e
d3101fb
4ccae50
f65854c
bd69e4b
acc63a9
b94d97d
398ed80
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -75,3 +75,15 @@ if [ $stage -le 4 ] && [ $stop_stage -ge 4 ]; then | |
mkdir -p data/fbank | ||
./local/compute_fbank_musan.py | ||
fi | ||
|
||
if [ $stage -le 5 ] && [ $stop_stage -ge 5 ]; then | ||
echo "Stage 5: Prepare phone based lang" | ||
# TODO: add BPE based lang | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Incidentally, something I want to try (I was working on this in Snowfall), is to generate a BPE lexicon parallel to the phone-based lexicon, and generate a dual lexicon that contains both phones (with disambig symbols) and BPE symbols. So we could call this a "dual lang directory". It would have words.txt, phones.txt, bpe.txt. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. .. then, I was thinking, we could train on both phone and BPE symbols-- perhaps alternating them on different minibatches, if time is a concern. We can even decode like this, by manipulating/rescoring lattices. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool idea. |
||
mkdir -p data/lang | ||
|
||
(echo '!SIL SIL'; echo '<SPOKEN_NOISE> SPN'; echo '<UNK> SPN'; ) | | ||
cat - data/lm/librispeech-lexicon.txt | | ||
sort | uniq > data/lang/lexicon.txt | ||
|
||
./local/prepare_lang.py | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lable -> label
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.