Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add RESULTS for kaldi pybind LF-MMI pipeline with PyTorch. #3831

Merged
merged 2 commits into from
Jan 20, 2020

Conversation

csukuangfj
Copy link
Contributor

@csukuangfj csukuangfj commented Jan 9, 2020

the training logs for kaldi pybind with PyTorch and nnet3 training are also contained.

kaldi pybind shares the same network architecture and feats.scp with nnet3.

There are only two differences between kaldi pybind and nnet3:
(1) kaldi pybind uses BatchNorm to replace the first LDA layer
(2) kaldi pybind uses the optimizer from PyTorch.

WER/CER from kaldi nnet3 is better than kaldi pybind but kaldi pybind training with PyTorch is much faster.

The training time in total for 6 epochs is summarized as follows:

  • kaldi pybind with PyTorch: about 45 minutes
  • kaldi nnet3: about 4 hours 37 minutes == 277 minutes

It is possible that kaldi nnet3 can use less number of epochs to converge to a point that
has better CER/WER than kaldi pybind.

A very simple scheduler is used in PyTorch; the results for kaldi pybind may be improved
by using a better learning rate scheduler.


So what do we gain from kaldi pybind ?

  1. training time. It is much faster
  2. free to use various kinds of networks supported by PyTorch or it is very easy
    to write your own nn.Module.
  3. you can try distributed training supported by Pytorch, e.g., DDP, or use horovod.
  4. other fancy stuff limited by your imagination.

@csukuangfj
Copy link
Contributor Author

@danpovey @naxingyu @jtrmal @songmeixu @qindazhu

please have a review.

@csukuangfj
Copy link
Contributor Author

a screenshot of the tensorboard for kaldi pybind with PyTorch is shown as follows.

You can get this image after executing run.sh and wait for the training stage to finish
and then run

tensorboard  --logdir ./exp/chain/train/tensboard

Screen Shot 2020-01-09 at 5 35 35 PM

@danpovey
Copy link
Contributor

danpovey commented Jan 9, 2020

Also, can you please implement the delta+delta-delta feature extraction as part of the network? This should improve the results. You can follow recent Kaldi scripts for guidance.
And does the nnet3 baseline have i-vector?

@csukuangfj
Copy link
Contributor Author

The nnet3 baseline uses NO ivector. It shares the same feats.scp with kaldi pybind.

@csukuangfj
Copy link
Contributor Author

I will implement delta+delta-delta later.

@jtrmal
Copy link
Contributor

jtrmal commented Jan 9, 2020

closing and reopening to trigger the travis checks

@jtrmal jtrmal closed this Jan 9, 2020
@jtrmal jtrmal reopened this Jan 9, 2020
@francisr
Copy link
Contributor

francisr commented Jan 9, 2020

What makes the Pytorch training so much faster than Kaldi's?

@csukuangfj
Copy link
Contributor Author

csukuangfj commented Jan 9, 2020

@francisr

The baseline network consists of

  • convolution,
  • batchnorm,
  • full connected

I am not sure why PyTorch is such faster than kaldi. I guess GEMM/GEMV in PyTorch is better optimized than kaldi.

@csukuangfj
Copy link
Contributor Author

I will force push to remove the contained log files and you can find them here.

kaldi-pybind-with-pytorch-training-log.txt
nnet3-training-log.txt

@RuABraun
Copy link
Contributor

RuABraun commented Jan 9, 2020

Surprised by how large the speed difference is! Awesome stuff.

@naxingyu
Copy link
Contributor

The nnet3 baseline uses NO ivector. It shares the same feats.scp with kaldi pybind.

Wait, aishell nnet3 baseline use ivector. Which baseline are you referring to?

@csukuangfj
Copy link
Contributor Author

@csukuangfj
Copy link
Contributor Author

closing and reopening to trigger travis CI.

@csukuangfj csukuangfj closed this Jan 13, 2020
@csukuangfj csukuangfj reopened this Jan 13, 2020
@fanlu
Copy link

fanlu commented Jan 14, 2020

Modification of milestones in MultiStepLR optimizer from [2,6,8,9] to [1,2,3,4,5] will give 0.5~0.6 more precision.

==> exp/chain/decode_res_train_modelmodel_opadam_bs128_ep6_lr1e-3_fpe150_110_90_hn625_fpr1500000_ms1_2_3_4_5/test/scoring_kaldi/best
_cer <==                                                                                                                            
%WER 9.37 [ 9817 / 104765, 606 ins, 670 del, 8541 sub ] exp/chain/decode_res_train_modelmodel_opadam_bs128_ep6_lr1e-3_fpe150_110_90_
hn625_fpr1500000_ms1_2_3_4_5/test/cer_10_1.0                                                                                        
                                                                                                                                    
==> exp/chain/decode_res_train_modelmodel_opadam_bs128_ep6_lr1e-3_fpe150_110_90_hn625_fpr1500000_ms1_2_3_4_5/test/scoring_kaldi/best
_wer <==                                                                                                                            
%WER 18.16 [ 11701 / 64428, 1009 ins, 1926 del, 8766 sub ] exp/chain/decode_res_train_modelmodel_opadam_bs128_ep6_lr1e-3_fpe150_110_
90_hn625_fpr1500000_ms1_2_3_4_5/test/wer_12_0.5                                                                                     
==> exp/chain/decode_res_train_modelmodel_opadam_bs128_ep6_lr1e-3_fpe150_110_90_hn625_fpr1500000_ms1_2_3_4_5/dev/scoring_kaldi/best_
cer <==                                                                                                                             
%WER 7.69 [ 15790 / 205341, 668 ins, 801 del, 14321 sub ] exp/chain/decode_res_train_modelmodel_opadam_bs128_ep6_lr1e-3_fpe150_110_9
0_hn625_fpr1500000_ms1_2_3_4_5/dev/cer_9_1.0                                                                                        
                                                                                                                                    
==> exp/chain/decode_res_train_modelmodel_opadam_bs128_ep6_lr1e-3_fpe150_110_90_hn625_fpr1500000_ms1_2_3_4_5/dev/scoring_kaldi/best_
wer <==                                                                                                                             
%WER 15.89 [ 20293 / 127698, 2055 ins, 2733 del, 15505 sub ] exp/chain/decode_res_train_modelmodel_opadam_bs128_ep6_lr1e-3_fpe150_11
0_90_hn625_fpr1500000_ms1_2_3_4_5/dev/wer_10_0.0

@csukuangfj
Copy link
Contributor Author

@fanlu

Great to see that you can reproduce the results.

I will update the parameters as you suggested.

@csukuangfj
Copy link
Contributor Author

By the way, is your training time per epoch or per batch matches the log posted above?

@fanlu
Copy link

fanlu commented Jan 14, 2020

By the way, is your training time per epoch or per batch matches the log posted above?

traning on P40 may consume more time than yours, it's about 66 minutes.

2020-01-14 14:12:00,333 INFO [train3.py:113] ./chain/train3.py --checkpoint= --device-id 3 --dir exp/chain/train_modelmodel_opadam_b
s128_ep6_lr5e-4_fpe150_110_90_hn625_fpr1500000_ms1_2_3_4_5 --feat-dim 43 --hidden-dim 625 --is-training true --kernel-size-list 1, 3
, 3, 3, 3, 3 --log-level info --output-dim 4336 --stride-list 1, 1, 3, 1, 1, 1 --multi-step 1, 2, 3, 4, 5 --model-name model --train
.cegs-dir exp/chain/merged_egs --train.den-fst exp/chain/den.fst --train.egs-left-context 13 --train.egs-right-context 13 --train.l2
-regularize 5e-4 --train.lr 5e-4 --train.num-epochs 6
2020-01-14 15:18:22,849 WARNING [train3.py:261] Done 

@csukuangfj
Copy link
Contributor Author

@fanlu
thanks

# Results for kaldi pybind LF-MMI training with PyTorch
## head exp/chain/decode_res/*/scoring_kaldi/best_* > RESULTS
#
==> exp/chain/decode_res/dev/scoring_kaldi/best_cer <==
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming scheme is not obvious from this file... what is "res"? Please clarify this, and also chain_nnet3.
And can you please make sure that these results (and where appropriate, the output of chain_dir_info.pl) are
in a comment at the top of the script that generated them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I will change it to follow the current style of egs/swbd/s5c/RESULTS.

# please note that it is important to have input layer with the name=input
# as the layer immediately preceding the fixed-affine-layer to enable
# the use of short notation for the descriptor
fixed-affine-layer name=lda input=Append(-1,0,1) affine-transform-file=$dir/configs/lda.mat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much do we lose from removing i-vectors? If you could make a comparison with run_tdnn_1a.sh via compare_wer.sh and put it in a comment at the top, that would be ideal. (If there is no compare_wer.sh, please
see if someone over there can make one for this setup!).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not use ivector since I have not figured it out how to integrate it into PyTorch.
I will try to add ivector and compare the results with/without using ivector.

--feat.cmvn-opts "--norm-means=false --norm-vars=false" \
--chain.xent-regularize $xent_regularize \
--chain.leaky-hmm-coefficient 0.1 \
--chain.l2-regularize 0.00005 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, these days we tend to set chain.l2-regularize to zero and instead rely on l2 regularization in the TDNN or TDNN-F layers. This reminds me that this recipe is super old! Does someone at mobvoi have time to test out a more recent recipe? E.g. you could try out the current Swbd recipe (I don't remember how much data is in aishell). We need to make sure that we are comparing against a recent baseline, or we won't be aiming for the right place!!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, I will switch to follow the recipe in swbd.

@danpovey
Copy link
Contributor

Just noticed this branch has conflicts.

@csukuangfj
Copy link
Contributor Author

I will resolve the conflicts tomorrow and try the new recipes during the Chinese New Year.

@danpovey
Copy link
Contributor

danpovey commented Jan 20, 2020 via email

@csukuangfj
Copy link
Contributor Author

Conflicts resolved.

@danpovey danpovey merged commit 9aff362 into kaldi-asr:pybind11 Jan 20, 2020
@qindazhu
Copy link
Contributor

Have run latest recipe for aishell #3868

@csukuangfj csukuangfj deleted the fangjun-LF-MMI-benchmark branch February 12, 2020 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants