Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CTC/AED option for Zipformer recipe #1389

Merged
merged 9 commits into from
Jul 5, 2024

Conversation

yaozengwei
Copy link
Collaborator

@yaozengwei yaozengwei commented Nov 22, 2023

This PR supports CTC/AED system for zipformer recipe.

  • CTC/AED results on LibriSpeech, trained for 50 epochs (--ctc-loss-scale=0.1, --attention-decoder-loss-scale=0.9), decoding method: sample 100-best paths from CTC lattice and rescore with the attention decoder
    • Zipformer-S, 46.3M, 2.46 / 6.04
    • Zipformer-M, 90.0M, 2.22 / 4.97
    • Zipformer-L, 174.3M, 2.09 / 4.59

@zw76859420 zw76859420 mentioned this pull request May 6, 2024
@yaozengwei yaozengwei merged commit f76afff into k2-fsa:master Jul 5, 2024
7 of 8 checks passed
@xingchensong
Copy link

Nice results ! Seems that (Zipformer-M-ctc/aed, 90.0M, 2.22 / 4.97) is comparable to (Zipformer-rnnt, 65.55 M, 2.21, 4.82), and (Zipformer-L-ctc/aed, 174.3M, 2.09 / 4.59) surpasses all prior benchmarks.

Because we know that in the RNNT model, the majority of the parameters are in the encoder, while in the CTC/AED model, the decoder parameters account for a significant portion. This leads to the appearance that the Zipformer-CTC/AED model has a much larger number of parameters compared to the Zipformer-RNNT, yet the number of encoder parameters in both models might actually be quite similar. Thus I'm particularly intrigued by the parameters utilized in Zipformer-L-ctc/aed, specifically those related to the encoder and decoder components. Could you provide more details on these?

@yaozengwei
Copy link
Collaborator Author

yaozengwei commented Jul 7, 2024

Nice results ! Seems that (Zipformer-M-ctc/aed, 90.0M, 2.22 / 4.97) is comparable to (Zipformer-rnnt, 65.55 M, 2.21, 4.82), and (Zipformer-L-ctc/aed, 174.3M, 2.09 / 4.59) surpasses all prior benchmarks.

Because we know that in the RNNT model, the majority of the parameters are in the encoder, while in the CTC/AED model, the decoder parameters account for a significant portion. This leads to the appearance that the Zipformer-CTC/AED model has a much larger number of parameters compared to the Zipformer-RNNT, yet the number of encoder parameters in both models might actually be quite similar. Thus I'm particularly intrigued by the parameters utilized in Zipformer-L-ctc/aed, specifically those related to the encoder and decoder components. Could you provide more details on these?

For these Zipformer CTC/AED models, we keep the attention-decoder model configurations almost same. (Different encoder output dimensions in Zipformer-S/M/L would cause slightly inconsistent number of parameters used in the attention-decoder.)
E.g.,

  • In Zipformer-L CTC/AED, Number of model parameters: 174319650, Number of model parameters in encoder: 146013641, Number of model parameters in attention_decoder: 27309556;
  • In Zipformer-M CTC/AED, Number of model parameters: 89987295, Number of model parameters in encoder: 63382150; Number of model parameters in attention_decoder: 25736692

yfyeung pushed a commit to yfyeung/icefall that referenced this pull request Aug 9, 2024
* add attention-decoder loss option for zipformer recipe

* add attention-decoder-rescoring

* update export.py and pretrained_ctc.py

* update RESULTS.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants