-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add RESULTS for kaldi pybind LF-MMI pipeline with PyTorch. #3831
Conversation
@danpovey @naxingyu @jtrmal @songmeixu @qindazhu please have a review. |
459f5ee
to
d953df8
Compare
Also, can you please implement the delta+delta-delta feature extraction as part of the network? This should improve the results. You can follow recent Kaldi scripts for guidance. |
The nnet3 baseline uses NO ivector. It shares the same |
I will implement |
closing and reopening to trigger the travis checks |
What makes the Pytorch training so much faster than Kaldi's? |
The baseline network consists of
I am not sure why PyTorch is such faster than kaldi. I guess GEMM/GEMV in PyTorch is better optimized than kaldi. |
I will force push to remove the contained log files and you can find them here. kaldi-pybind-with-pytorch-training-log.txt |
d953df8
to
44ae951
Compare
Surprised by how large the speed difference is! Awesome stuff. |
Wait, aishell nnet3 baseline use ivector. Which baseline are you referring to? |
I was referring to this file which has no ivector. |
closing and reopening to trigger travis CI. |
Modification of milestones in MultiStepLR optimizer from [2,6,8,9] to [1,2,3,4,5] will give 0.5~0.6 more precision.
|
Great to see that you can reproduce the results. I will update the parameters as you suggested. |
By the way, is your training time per epoch or per batch matches the log posted above? |
traning on P40 may consume more time than yours, it's about 66 minutes.
|
@fanlu |
# Results for kaldi pybind LF-MMI training with PyTorch | ||
## head exp/chain/decode_res/*/scoring_kaldi/best_* > RESULTS | ||
# | ||
==> exp/chain/decode_res/dev/scoring_kaldi/best_cer <== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming scheme is not obvious from this file... what is "res"? Please clarify this, and also chain_nnet3.
And can you please make sure that these results (and where appropriate, the output of chain_dir_info.pl) are
in a comment at the top of the script that generated them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I will change it to follow the current style of egs/swbd/s5c/RESULTS
.
# please note that it is important to have input layer with the name=input | ||
# as the layer immediately preceding the fixed-affine-layer to enable | ||
# the use of short notation for the descriptor | ||
fixed-affine-layer name=lda input=Append(-1,0,1) affine-transform-file=$dir/configs/lda.mat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much do we lose from removing i-vectors? If you could make a comparison with run_tdnn_1a.sh via compare_wer.sh and put it in a comment at the top, that would be ideal. (If there is no compare_wer.sh, please
see if someone over there can make one for this setup!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not use ivector
since I have not figured it out how to integrate it into PyTorch.
I will try to add ivector
and compare the results with/without using ivector
.
--feat.cmvn-opts "--norm-means=false --norm-vars=false" \ | ||
--chain.xent-regularize $xent_regularize \ | ||
--chain.leaky-hmm-coefficient 0.1 \ | ||
--chain.l2-regularize 0.00005 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, these days we tend to set chain.l2-regularize to zero and instead rely on l2 regularization in the TDNN or TDNN-F layers. This reminds me that this recipe is super old! Does someone at mobvoi have time to test out a more recent recipe? E.g. you could try out the current Swbd recipe (I don't remember how much data is in aishell). We need to make sure that we are comparing against a recent baseline, or we won't be aiming for the right place!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, I will switch to follow the recipe in swbd.
Just noticed this branch has conflicts. |
I will resolve the conflicts tomorrow and try the new recipes during the Chinese New Year. |
Thanks!!!
BTW check with Haowen before trying the new recipes... I think he was
going to try that.
…On Sun, Jan 19, 2020 at 8:32 PM Fangjun Kuang ***@***.***> wrote:
I will resolve the conflicts tomorrow and try the new recipes during the
Chinese New Year.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3831?email_source=notifications&email_token=AAZFLOZR2QKATQ4X242BZETQ6RB6XA5CNFSM4KEVCQZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKRAKQ#issuecomment-576000042>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO75KPHDD67XIXZXZLDQ6RB6XANCNFSM4KEVCQZQ>
.
|
0fb1df1
to
e8a28b5
Compare
Conflicts resolved. |
Have run latest recipe for aishell #3868 |
the training logs for kaldi pybind with PyTorch and nnet3 training are also contained.
kaldi pybind shares the same network architecture and
feats.scp
with nnet3.There are only two differences between kaldi pybind and nnet3:
(1) kaldi pybind uses BatchNorm to replace the first LDA layer
(2) kaldi pybind uses the optimizer from PyTorch.
WER/CER from kaldi nnet3 is better than kaldi pybind but kaldi pybind training with PyTorch is much faster.
The training time in total for 6 epochs is summarized as follows:
It is possible that kaldi nnet3 can use less number of epochs to converge to a point that
has better CER/WER than kaldi pybind.
A very simple scheduler is used in PyTorch; the results for kaldi pybind may be improved
by using a better learning rate scheduler.
So what do we gain from kaldi pybind ?
to write your own
nn.Module
.