Add CTC training #3

csukuangfj · 2021-07-15T09:44:44Z

There are various code formatting and style issues in snowfall since
it is written by different people with different preferred styles.

This pull request tries to ensure that code styles in icefall are as consistent as possible; all happens automagically with the help of the following tools:

pzelasko · 2021-07-15T11:24:11Z

+1 for black, not sure about mypy — it will be very strict about typing and we might end up spending a lot of extra effort to adhere to its strict checks.

csukuangfj · 2021-07-15T11:50:47Z

it will be very strict about typing and we might end up spending a lot of extra effort to adhere to its strict checks.

ok, in that case, I can remove mypy.

csukuangfj · 2021-07-15T13:10:42Z

I am trying to add the data preparation part for the LibriSpeech recipe and am trying to put everything in Python.

pzelasko · 2021-07-15T15:14:18Z

Are you considering porting prepare_lang.sh to Python? I have wanted to do it for some time now... it will be very useful for adding new recipes.

danpovey · 2021-07-16T09:21:08Z

If we do port it to Python I don't think we need an entire Kaldi-compatible lang directory, we can just keep the things we need for k2. We might have several different possible formats, e.g. a lexicon-based version and a BPE-based version with different information. But I think writing things to files in a directory is a good idea as it makes them easy to inspect.

pzelasko · 2021-07-19T15:39:22Z

egs/librispeech/ASR/prepare.sh

  mkdir -p data/LibriSpeech
-  # TODO
+
+  if [ ! -f data/LibriSpeech/train-other-500/.completed ]; then


I don't think that this check is technically needed; i.e. the download script would figure the completion status by itself

OTOH I am concerned about fixing the location of the dataset in data/LibriSpeech -- people tend to have corpora downloaded in some standard locations on their own setups, I think this should remain customizable

the download script would figure the completion status by itself

If the zipped files are removed after extraction, the script will download it again. That's why I add this check to avoid invoking the download script.

people tend to have corpora downloaded in some standard locations on their own setups, I think this should remain customizable

Fix the location of the dataset makes the code simpler. Can we let the users create a symlink to its original dataset path, i.e.,

ln -s /path/to/LibriSpeech data/

as mentioned in the script

icefall/egs/librispeech/ASR/prepare.sh

Lines 21 to 25 in 0b19aa0

# If you have pre-downloaded it to /path/to/LibriSpeech,

# you can create a symlink to avoid downloading it again:

#

# ln -sfv /path/to/LibriSpeech data/

#

ah, I see, I missed the bit about the symbolic link before. I am OK with that.

If the zipped files are removed after extraction, the script will download it again. That's why I add this check to avoid invoking the download script.

I see... maybe this is the right way then. I'm not sure if there is a straightforward way to address this issue inside of lhotse download.

... I take that back -- I think we can change Lhotse so that the "completed detector" is executed before downloading the files (move this line a few lines up)

I can make this change later for all the recipes. WDYT @csukuangfj ?

I can make this change later for all the recipes.

That's would be great. In that case, I won't need to add that check here.

pzelasko · 2021-07-19T15:40:43Z

egs/librispeech/ASR/prepare.sh

+  #   ln -s /path/to/musan data/
+  #
+  if [ ! -e data/musan ]; then
+    wget https://www.openslr.org/resources/17/musan.tar.gz


There is download_musan() here: https://github.com/lhotse-speech/lhotse/blob/master/lhotse/recipes/musan.py#L27

and $ lhotse download musan here: https://github.com/lhotse-speech/lhotse/blob/master/lhotse/bin/modes/recipes/musan.py#L25

csukuangfj · 2021-07-19T15:43:13Z

Are you considering porting prepare_lang.sh to Python?

Yes, but I am going to implement only a subset of prepare_lang.sh that is needed by snowfall, i.e., it contains
no lexiconp.txt, no unk_fst, no position-dependent phones, no extra questions.txt, no silprob, no word boundary, no grammar_opts. Only those that are currently used in snowfall will be ported to Python.

If that's ok, then I will go ahead. Otherwise, I will use the current prepare_lang.sh

pzelasko · 2021-07-19T15:43:48Z

egs/librispeech/ASR/local/prepare_librispeech_manifest.py

+    output_dir = Path("data/manifests")
+    num_jobs = min(15, os.cpu_count())
+
+    librispeech_manifests = prepare_librispeech(


I think this and the download script can be completely replaced with:

$ lhotse download librispeech --full $CORPUS_DIR $ lhotse prepare librispeech -j $NUM_JOBS $CORPUS_DIR $MANIFEST_DIR

pzelasko · 2021-07-19T15:45:33Z

egs/librispeech/ASR/local/compute_fbank_musan.py

+
+
+@contextmanager
+def get_executor():


This executor bit seems like a good candidate to move to the library-level?

Will move it to a new file local/utils.py

pzelasko · 2021-07-19T15:46:13Z

Are you considering porting prepare_lang.sh to Python?

Yes, but I am going to implement only a subset of prepare_lang.sh that is needed by snowfall, i.e., it contains
no lexiconp.txt, no unk_fst, no position-dependent phones, no extra questions.txt, no silprob, no word boundary, no grammar_opts. Only those that are currently used in snowfall will be ported to Python.

If that's ok, then I will go ahead. Otherwise, I will use the current prepare_lang.sh

That sounds good to me!

csukuangfj · 2021-07-20T11:51:36Z

I've ported a subset of prepare_lang.sh to local/prepare_lang.py

Here are some test results.

Input lexicon.txt

!SIL SIL
<SPOKEN_NOISE> SPN
<UNK> SPN
f f
a a
foo f o o
bar b a r
bark b a r k
food f o o d
food2 f o o d
fo f o

The following are outputs:

lexicon_disambig.txt

!SIL SIL
<SPOKEN_NOISE> SPN #1
<UNK> SPN #2
f f #1
a a
foo f o o #1
bar b a r #1
bark b a r k
food f o o d #1
food2 f o o d #2
fo f o #1

phones.txt

<eps> 0
SIL 1
SPN 2
a 3
b 4
d 5
f 6
k 7
o 8
r 9
#0 10
#1 11
#2 12

words.txt

<eps> 0
!SIL 1
<SPOKEN_NOISE> 2
<UNK> 3
a 4
bar 5
bark 6
f 7
fo 8
foo 9
food 10
food2 11
#0 12
<s> 13
</s> 14

L.fst

L_disambig.fst

danpovey · 2021-07-20T11:55:43Z

Great!!
BTW, I anticipate possibly having different versions of the lexicon for BPE and phones, either separately (separate systems) or at the same time
(jointly trained systems). The BPE version might use a more compact representation, e.g as a ragged array, at the python level, although that's separate from the on-disk representation where we may choose something maximally human readable.
I'm not trying to dictate anything here about the formats-- I would probably overthink things-- I'm just saying this stuff so you can keep it in mind and make your own decisions.
It might be that the Lexicon object, if there is such a thing, would have phone and BPE versions with different characteristics and perhaps some methods in common but not all. Again, I don't want to dictate any of this, just giving ideas.

csukuangfj · 2021-07-21T02:38:17Z

I just found that prepare_lang.sh adds base phones to the lexicon.
That is, there are AA0, AA1, AA2 in the lexicon; that script adds a new phone AA to the phones.txt.

I cannot think of any benefits of adding AA. It never appears in the training data since it does not
exist in the lexicon. Moreover, it increases the number of neural output units and never gets trained.

@danpovey Should the python version prepare_lang.py follow prepare_lang.sh to add AA or just ignore it?

danpovey · 2021-07-21T02:57:25Z

You can ignore it. The only time it's really needed is for the optional silence.
Note: technically the silence can appear inside words-- at least, we have no check against this-- which is why SIL_B snd so on still exist.

danpovey · 2021-07-21T09:42:58Z

egs/librispeech/ASR/local/prepare_lang.py

+    states_needs_self_loops = set()
+    for arc in arcs:
+        src, dst, ilable, olable, score = arc
+        if olable != 0:


lable -> label

Thanks. Fixed.

csukuangfj · 2021-07-27T09:49:28Z

Here are the decoding results with icefall (using only the encoder part) from the pre-trained model (downloaded from https://huggingface.co/GuoLiyong/snowfall_bpe_model/tree/main/exp-duration-200-feat_batchnorm-bpe-lrfactor5.0-conformer-512-8-noam, as mentioned in k2-fsa/snowfall#227)

HLG - no LM rescoring

(output beam size is 8)

1-best decoding

[test-clean-no_rescore] %WER 3.15% [1656 / 52576, 127 ins, 377 del, 1152 sub ]
[test-other-no_rescore] %WER 7.03% [3682 / 52343, 220 ins, 1024 del, 2438 sub ]

n-best decoding

For n=100,

[test-clean-no_rescore-100] %WER 3.15% [1656 / 52576, 127 ins, 377 del, 1152 sub ]
[test-other-no_rescore-100] %WER 7.14% [3737 / 52343, 275 ins, 1020 del, 2442 sub ]

For n=200,

[test-clean-no_rescore-200] %WER 3.16% [1660 / 52576, 125 ins, 378 del, 1157 sub ]
[test-other-no_rescore-200] %WER 7.04% [3684 / 52343, 228 ins, 1012 del, 2444 sub ]

HLG - with LM rescoring

Whole lattice rescoring

[test-clean-lm_scale_0.8] %WER 2.77% [1456 / 52576, 150 ins, 210 del, 1096 sub ]
[test-other-lm_scale_0.8] %WER 6.23% [3262 / 52343, 246 ins, 635 del, 2381 sub ]

WERs of different LM scales are:

For test-clean, WER of different settings are:
lm_scale_0.8    2.77    best for test-clean
lm_scale_0.9    2.87
lm_scale_1.0    3.06
lm_scale_1.1    3.34
lm_scale_1.2    3.71
lm_scale_1.3    4.18
lm_scale_1.4    4.8
lm_scale_1.5    5.48
lm_scale_1.6    6.08
lm_scale_1.7    6.79
lm_scale_1.8    7.49
lm_scale_1.9    8.14
lm_scale_2.0    8.82

For test-other, WER of different settings are:
lm_scale_0.8    6.23    best for test-other
lm_scale_0.9    6.37
lm_scale_1.0    6.62
lm_scale_1.1    6.99
lm_scale_1.2    7.46
lm_scale_1.3    8.13
lm_scale_1.4    8.84
lm_scale_1.5    9.61
lm_scale_1.6    10.32
lm_scale_1.7    11.17
lm_scale_1.8    12.12
lm_scale_1.9    12.93
lm_scale_2.0    13.77

n-best LM rescoring

n = 100

[test-clean-lm_scale_0.8] %WER 2.79% [1469 / 52576, 149 ins, 212 del, 1108 sub ]
[test-other-lm_scale_0.8] %WER 6.36% [3329 / 52343, 259 ins, 666 del, 2404 sub ]

WERs of different LM scales are:

For test-clean, WER of different settings are:
lm_scale_0.8    2.79    best for test-clean
lm_scale_0.9    2.89
lm_scale_1.0    3.03
lm_scale_1.1    3.28
lm_scale_1.2    3.52
lm_scale_1.3    3.78
lm_scale_1.4    4.04
lm_scale_1.5    4.24
lm_scale_1.6    4.45
lm_scale_1.7    4.58
lm_scale_1.8    4.7
lm_scale_1.9    4.8
lm_scale_2.0    4.92
For test-other, WER of different settings are:
lm_scale_0.8    6.36    best for test-other
lm_scale_0.9    6.45
lm_scale_1.0    6.64
lm_scale_1.1    6.92
lm_scale_1.2    7.25
lm_scale_1.3    7.59
lm_scale_1.4    7.88
lm_scale_1.5    8.13
lm_scale_1.6    8.36
lm_scale_1.7    8.54
lm_scale_1.8    8.71
lm_scale_1.9    8.88
lm_scale_2.0    9.02

n = 150

[test-clean-lm_scale_0.8] %WER 2.80% [1472 / 52576, 149 ins, 218 del, 1105 sub ]
[test-other-lm_scale_0.8] %WER 6.35% [3325 / 52343, 262 ins, 660 del, 2403 sub ]

For test-clean, WER of different settings are:
lm_scale_0.8    2.8     best for test-clean
lm_scale_0.9    2.89
lm_scale_1.0    3.05
lm_scale_1.1    3.3
lm_scale_1.2    3.56
lm_scale_1.3    3.83
lm_scale_1.4    4.1
lm_scale_1.5    4.32
lm_scale_1.6    4.56
lm_scale_1.7    4.73
lm_scale_1.8    4.88
lm_scale_1.9    5.01
lm_scale_2.0    5.14
For test-other, WER of different settings are:
lm_scale_0.8    6.35    best for test-other
lm_scale_0.9    6.44
lm_scale_1.0    6.63
lm_scale_1.1    6.91
lm_scale_1.2    7.25
lm_scale_1.3    7.62
lm_scale_1.4    7.92
lm_scale_1.5    8.23
lm_scale_1.6    8.43
lm_scale_1.7    8.67
lm_scale_1.8    8.89
lm_scale_1.9    9.08
lm_scale_2.0    9.23

danpovey · 2021-07-27T11:22:07Z

Great! By the n-best LM rescoring, are you talking about a neural or n-gram LM?

pzelasko · 2021-07-27T11:25:12Z

Nice! Do we have the scripts for training the BPE models from scratch somewhere?

csukuangfj · 2021-07-27T11:26:38Z

Great! By the n-best LM rescoring, are you talking about a neural or n-gram LM?

I am only using the transformer encoder + HLG decoding + 4-gram rescoring.

Will integrate the attention decoder for rescoring.

csukuangfj · 2021-07-27T11:27:56Z

Nice! Do we have the scripts for training the BPE models from scratch somewhere?

Yes, the training code is in the pull-request: k2-fsa/snowfall#219

We're porting it to icefall and polishing the training code.

danpovey · 2021-07-29T03:49:18Z

I'd rather worry about the present than the future. This issue could easily become a limiting factor in whether people use Lhotse.

…

On Thu, Jul 29, 2021 at 12:18 AM Piotr Żelasko ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In .github/workflows/test.yml <#3 (comment)>: > + - uses: ***@***.*** + with: + fetch-depth: 0 + + - name: Setup Python ${{ matrix.python-version }} + uses: ***@***.*** + with: + python-version: ${{ matrix.python-version }} + + - name: Install Python dependencies + run: | + python3 -m pip install --upgrade pip pytest kaldialign + pip install k2==${{ matrix.k2-version }}+cpu.torch${{ matrix.torch }} -f https://k2-fsa.org/nightly/ + + # Don't use: pip install lhotse + # since it installs a version of PyTorch that is not predictable (it's not ideal because it downloads all the torchaudio versions from the newest going down to the right one, but at least it seems to work) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO7WRXGHD34LVZTRIBDT2AUW3ANCNFSM5ANFYHUA> .

csukuangfj · 2021-07-31T07:34:44Z

The following are the partial results (1best decoding without LM rescoring and without attention decoder)
of the following training command:

./conformer_ctc/train.py \
  --bucketing-sampler 1 \
  --num-buckets 1000 \
  --concatenate-cuts 0 \
  --max-duration 200 \
  --full-libri 1 \
  --world-size 3

It seems it is able to reproduce what Liyong has been doing. I think it is safe to merge now. I will fix all the comments in
other PRs.

The tensorboard log is available at
https://tensorboard.dev/experiment/plZOLw07RUGYw8sGEybXyg/

epoch-0

[test-clean-no_rescore] %WER 34.43% [18104 / 52576, 56 ins, 13896 del, 4152 sub ]
[test-other-no_rescore] %WER 47.82% [25032 / 52343, 49 ins, 19273 del, 5710 sub ]

epoch-1

[test-clean-no_rescore] %WER 15.64% [8225 / 52576, 121 ins, 4689 del, 3415 sub ]
[test-other-no_rescore] %WER 29.53% [15455 / 52343, 130 ins, 9454 del, 5871 sub ]

average of epoch 0 and 1

[test-clean-no_rescore] %WER 16.69% [8775 / 52576, 222 ins, 4173 del, 4380 sub ]
[test-other-no_rescore] %WER 30.20% [15810 / 52343, 282 ins, 8426 del, 7102 sub ]

epoch-2

[test-clean-no_rescore] %WER 11.83% [6220 / 52576, 140 ins, 3309 del, 2771 sub ]
[test-other-no_rescore] %WER 23.83% [12471 / 52343, 162 ins, 7268 del, 5041 sub ]

average of epoch 1 and 2

[test-clean-no_rescore] %WER 9.40% [4940 / 52576, 172 ins, 2036 del, 2732 sub ]
[test-other-no_rescore] %WER 19.25% [10075 / 52343, 247 ins, 4521 del, 5307 sub ]

average of epoch 0, 1 and 2

[test-clean-no_rescore] %WER 15.06% [7918 / 52576, 244 ins, 3516 del, 4158 sub ]
[test-other-no_rescore] %WER 29.58% [15482 / 52343, 288 ins, 8137 del, 7057 sub ]

epoch-3

[test-clean-no_rescore] %WER 10.65% [5600 / 52576, 102 ins, 3013 del, 2485 sub ]
[test-other-no_rescore] %WER 23.10% [12091 / 52343, 127 ins, 7583 del, 4381 sub ]

average of epoch 2 and 3

[test-clean-no_rescore] %WER 7.38% [3880 / 52576, 147 ins, 1483 del, 2250 sub ]
[test-other-no_rescore] %WER 16.26% [8512 / 52343, 207 ins, 3808 del, 4497 sub ]

average of epoch 1, 2 and 3

[test-clean-no_rescore] %WER 8.49% [4464 / 52576, 164 ins, 1746 del, 2554 sub ]
[test-other-no_rescore] %WER 18.20% [9528 / 52343, 248 ins, 4221 del, 5059 sub ]

average of epoch 0, 1, 2 and 3

[test-clean-no_rescore] %WER 12.93% [6800 / 52576, 224 ins, 3028 del, 3548 sub ]
[test-other-no_rescore] %WER 26.00% [13609 / 52343, 312 ins, 6635 del, 6662 sub ]

epoch 4

[test-clean-no_rescore] %WER 8.40% [4414 / 52576, 131 ins, 2048 del, 2235 sub ]
[test-other-no_rescore] %WER 18.93% [9907 / 52343, 149 ins, 5410 del, 4348 sub ]

average of epoch 3 and 4

[test-clean-no_rescore] %WER 6.24% [3280 / 52576, 150 ins, 1100 del, 2030 sub
[test-other-no_rescore] %WER 14.12% [7390 / 52343, 197 ins, 3058 del, 4135 sub ]

average of epoch 2, 3 and 4

[test-clean-no_rescore] %WER 6.27% [3295 / 52576, 156 ins, 1051 del, 2088 sub ]
[test-other-no_rescore] %WER 14.61% [7645 / 52343, 232 ins, 2977 del, 4436 sub ]

average of epoch 1, 2, 3 and 4

[test-clean-no_rescore] %WER 8.83% [4643 / 52576, 170 ins, 1952 del, 2521 sub ]
[test-other-no_rescore] %WER 20.18% [10564 / 52343, 247 ins, 4855 del, 5462 sub ]

average of epoch 0, 1, 2, 3 and 4

[test-clean-no_rescore] %WER 14.56% [7657 / 52576, 202 ins, 3777 del, 3678 sub ]
[test-other-no_rescore] %WER 30.02% [15714 / 52343, 277 ins, 8482 del, 6955 sub ]

epoch 5

[test-clean-no_rescore] %WER 6.97% [3665 / 52576, 126 ins, 1484 del, 2055 sub ]
[test-other-no_rescore] %WER 16.21% [8483 / 52343, 157 ins, 4283 del, 4043 sub ]

average of epoch 4 and 5

[test-clean-no_rescore] %WER 5.47% [2877 / 52576, 144 ins, 911 del, 1822 sub ]
[test-other-no_rescore] %WER 12.90% [6752 / 52343, 218 ins, 2636 del, 3898 sub ]

average of epoch 3, 4 and 5

[test-clean-no_rescore] %WER 5.34% [2808 / 52576, 154 ins, 781 del, 1873 sub ]
[test-other-no_rescore] %WER 12.39% [6487 / 52343, 217 ins, 2265 del, 4005 sub ]

average of epoch 2, 3, 4 and 5

[test-clean-no_rescore] %WER 6.00% [3153 / 52576, 154 ins, 941 del, 2058 sub ]
[test-other-no_rescore] %WER 14.27% [7471 / 52343, 247 ins, 2648 del, 4576 sub ]

average of epoch 1, 2, 3, 4 and 5

[test-clean-no_rescore] %WER 10.10% [5311 / 52576, 160 ins, 2288 del, 2863 sub ]
[test-other-no_rescore] %WER 23.36% [12229 / 52343, 221 ins, 5877 del, 6131 sub ]

average of epoch 0, 1, 2, 3, 4 and 5

test-clean-no_rescore] %WER 22.07% [11606 / 52576, 149 ins, 6744 del, 4713 sub ]
[test-other-no_rescore] %WER 41.52% [21733 / 52343, 159 ins, 13730 del, 7844 sub ]

epoch 6

[test-clean-no_rescore] %WER 6.67% [3507 / 52576, 101 ins, 1603 del, 1803 sub ]
[test-other-no_rescore] %WER 15.59% [8160 / 52343, 141 ins, 4465 del, 3554 sub ]

average of epoch 5 and 6

[test-clean-no_rescore] %WER 5.23% [2750 / 52576, 127 ins, 944 del, 1679 sub ]
[test-other-no_rescore] %WER 12.23% [6403 / 52343, 187 ins, 2649 del, 3567 sub ]

average of epoch 4, 5 and 6

[test-clean-no_rescore] %WER 5.07% [2666 / 52576, 134 ins, 821 del, 1711 sub ]
[test-other-no_rescore] %WER 11.69% [6118 / 52343, 205 ins, 2322 del, 3591 sub ]

average of epoch 3, 4, 5 and 6

[test-clean-no_rescore] %WER 5.09% [2674 / 52576, 151 ins, 732 del, 1791 sub ]
[test-other-no_rescore] %WER 11.86% [6209 / 52343, 207 ins, 2125 del, 3877 sub ]

average of epoch 2, 3, 4, 5 and 6

[test-clean-no_rescore] %WER 5.96% [3136 / 52576, 152 ins, 951 del, 2033 sub ]
[test-other-no_rescore] %WER 14.77% [7730 / 52343, 260 ins, 2776 del, 4694 sub ]

average of epoch 1, 2, 3, 4, 5 and 6

[test-clean-no_rescore] %WER 12.04% [6328 / 52576, 151 ins, 2859 del, 3318 sub ]
[test-other-no_rescore] %WER 27.15% [14213 / 52343, 230 ins, 7208 del, 6775 sub ]

average of epoch 0, 1, 2, 3, 4, 5 and 6

[test-clean-no_rescore] %WER 34.10% [17928 / 52576, 145 ins, 11849 del, 5934 sub ]
[test-other-no_rescore] %WER 55.40% [29000 / 52343, 117 ins, 20099 del, 8784 sub ]

epoch 7

[test-clean-no_rescore] %WER 6.48% [3405 / 52576, 120 ins, 1468 del, 1817 sub ]
[test-other-no_rescore] %WER 14.98% [7843 / 52343, 139 ins, 4147 del, 3557 sub ]

average of epoch 6 and 7

[test-clean-no_rescore] %WER 5.15% [2706 / 52576, 117 ins, 987 del, 1602 sub ]
[test-other-no_rescore] %WER 12.09% [6330 / 52343, 174 ins, 2846 del, 3310 sub ]

average of epoch 5, 6 and 7

[test-clean-no_rescore] %WER 4.87% [2560 / 52576, 133 ins, 805 del, 1622 sub ]
[test-other-no_rescore] %WER 11.34% [5934 / 52343, 195 ins, 2379 del, 3360 sub ]

average of epoch 4, 5, 6 and 7

[test-clean-no_rescore] %WER 4.82% [2533 / 52576, 136 ins, 737 del, 1660 sub ]
[test-other-no_rescore] %WER 11.12% [5821 / 52343, 203 ins, 2125 del, 3493 sub ]

average of epoch 3, 4, 5, 6 and 7

[test-clean-no_rescore] %WER 4.89% [2572 / 52576, 148 ins, 699 del, 1725 sub ]
[test-other-no_rescore] %WER 11.62% [6083 / 52343, 211 ins, 2091 del, 3781 sub ]

average of epoch 2, 3, 4, 5, 6 and 7

[test-clean-no_rescore] %WER 5.95% [3128 / 52576, 153 ins, 949 del, 2026 sub ]
[test-other-no_rescore] %WER 14.77% [7730 / 52343, 250 ins, 2756 del, 4724 sub ]

average of epoch 1, 2, 3, 4, 5, 6 and 7

[test-clean-no_rescore] %WER 13.60% [7148 / 52576, 156 ins, 3317 del, 3675 sub ]
[test-other-no_rescore] %WER 29.95% [15677 / 52343, 199 ins, 8156 del, 7322 sub ]

average of epoch 0, 1, 2, 3, 4, 5, 6 and 7

[test-clean-no_rescore] %WER 47.09% [24758 / 52576, 80 ins, 18322 del, 6356 sub ]
[test-other-no_rescore] %WER 67.21% [35182 / 52343, 52 ins, 26717 del, 8413 sub ]

epoch 8

[test-clean-no_rescore] %WER 6.44% [3386 / 52576, 117 ins, 1648 del, 1621 sub ]
[test-other-no_rescore] %WER 14.59% [7639 / 52343, 142 ins, 4293 del, 3204 sub ]

average of epoch 7 and 8

[test-clean-no_rescore] %WER 5.25% [2760 / 52576, 130 ins, 1035 del, 1595 sub ]
[test-other-no_rescore] %WER 11.71% [6131 / 52343, 159 ins, 2822 del, 3150 sub ]

average of epoch 6, 7, and 8

[test-clean-no_rescore] %WER 4.83% [2542 / 52576, 126 ins, 883 del, 1533 sub ]
[test-other-no_rescore] %WER 11.07% [5793 / 52343, 189 ins, 2513 del, 3091 sub ]

average of epoch 5, 6, 7, and 8

[test-clean-no_rescore] %WER 4.68% [2462 / 52576, 135 ins, 757 del, 1570 sub ]
[test-other-no_rescore] %WER 10.74% [5621 / 52343, 196 ins, 2187 del, 3238 sub ]

average of epoch 4, 5, 6, 7, and 8

[test-clean-no_rescore] %WER 4.64% [2438 / 52576, 139 ins, 706 del, 1593 sub ]
[test-other-no_rescore] %WER 10.63% [5562 / 52343, 200 ins, 1975 del, 3387 sub ]

average of epoch 3, 4, 5, 6, 7, and 8

[test-clean-no_rescore] %WER 4.81% [2527 / 52576, 150 ins, 710 del, 1667 sub ]
[test-other-no_rescore] %WER 11.21% [5867 / 52343, 218 ins, 1992 del, 3657 sub ]

average of epoch 2, 3, 4, 5, 6, 7, and 8

[test-clean-no_rescore] %WER 6.02% [3165 / 52576, 149 ins, 984 del, 2032 sub ]
[test-other-no_rescore] %WER 14.58% [7634 / 52343, 262 ins, 2751 del, 4621 sub ]

danpovey · 2021-07-31T07:36:11Z

We'll address the remaining comments from @pzelasko later on.

danpovey · 2021-07-31T07:54:08Z

Great!
BTW, perhaps we could put shared executable scripts like parse_options.sh in shared/, which can be a soft link, rather
than local/.

csukuangfj · 2021-07-31T07:59:01Z

Where should the shared directory be, icefall/shared?

danpovey · 2021-07-31T08:00:28Z

Sure, that sounds OK.
.. also I think it might be a good idea, in our data-directory hierarchy, to make a very clear distinction between data that might be written to by local scripts, and data that is simply downloaded from elsewhere. I'd like to make it possibe to set the data-download dir to some common (possibly-writable) directory so that if multiple people share it, it will just do the appropriate caching.

csukuangfj · 2021-07-31T08:02:51Z

.. also I think it might be a good idea, in our data-directory hierarchy, to make a very clear distinction between data that might be written to by local scripts, and data that is simply downloaded from elsewhere. I

I agree. Will do it.

pzelasko · 2021-07-31T13:50:54Z

Cool! Nice work. I wondered about the choice of num buckets 1000, what was your motivation?

…

Wiadomość napisana przez Fangjun Kuang ***@***.***> w dniu 7/31/21, o godz. 04:03: .. also I think it might be a good idea, in our data-directory hierarchy, to make a very clear distinction between data that might be written to by local scripts, and data that is simply downloaded from elsewhere. I I agree. Will do it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

csukuangfj · 2021-07-31T14:51:44Z

Cool! Nice work. I wondered about the choice of num buckets 1000, what was your motivation?

The training options are copied from Liyong's work in
https://github.com/k2-fsa/snowfall/pull/219/files#diff-fab538403388f199d70e4194f70d212991f5f72cad14f8357c4c40d4a2699ad4

@glynpu Maybe Liyong has something to say about this.

Add style check tools.

71c4e29

csukuangfj added 2 commits July 15, 2021 19:52

Remove mypy.

d146a4e

Download LM for LibriSpeech.

40eed74

Compute features of librispeech and musan.

0b19aa0

pzelasko reviewed Jul 19, 2021

View reviewed changes

csukuangfj added 3 commits July 20, 2021 00:14

Fixes after review.

f25eedf

Minor fixes after review.

e005ea0

Add prepare_lang.py based on prepare_lang.sh

d5e0408

Minor fixes.

8a72901

Add self-loops to propagate disambiguation symbols.

a01d08f

danpovey reviewed Jul 21, 2021

View reviewed changes

Add CTC training.

f3542c7

csukuangfj changed the title ~~Add style check tools.~~ Add CTC training Jul 24, 2021

csukuangfj added 3 commits July 24, 2021 17:47

Add CI test.

2e33e24

Fix CI dependencies installation.

ee83a3e

Fix CI.

5443618

csukuangfj added 3 commits July 26, 2021 08:08

Fix loading checkpoint in DDP training.

d3101fb

WIP: Begin to add BPE decoding

4ccae50

Add BPE decoding results.

f65854c

Use attention decoder for rescoring.

bd69e4b

csukuangfj added 3 commits July 29, 2021 20:23

WIP: Add BPE training code.

acc63a9

Disable gradient computation in evaluation mode.

b94d97d

Minor fixes to support DDP training.

398ed80

danpovey merged commit cf8d762 into k2-fsa:master Jul 31, 2021

csukuangfj mentioned this pull request Aug 3, 2021

Refactoring #4

Merged

1 task

danpovey mentioned this pull request Sep 2, 2021

RuntimeError: Specified device cuda:0 does not match device of data cuda:-2 #33

Closed

Lzhang-hub mentioned this pull request Oct 11, 2021

CUDA out of memory in decoding #70

Open

danpovey mentioned this pull request Nov 27, 2021

Decoding error 'Fsa' object doesn't support assignment. #133

Open

wwxm0523 mentioned this pull request Jan 30, 2022

LF-MMI GPU OOM #196

Open

ahazned mentioned this pull request Apr 13, 2022

Illegal memory error when training with multi-GPU #247

Open

iggygeek mentioned this pull request Nov 29, 2023

Zipformer training crash : 'cannot set number of interop threads ' ... #1395

Closed

ngoel17 mentioned this pull request Sep 30, 2024

Illegal memory access during zipformer training #1764

Closed

	# If you have pre-downloaded it to /path/to/LibriSpeech,
	# you can create a symlink to avoid downloading it again:
	#
	# ln -sfv /path/to/LibriSpeech data/
	#



		@contextmanager
		def get_executor():

Add CTC training #3

Add CTC training #3

Conversation

csukuangfj commented Jul 15, 2021 • edited Loading

pzelasko commented Jul 15, 2021

csukuangfj commented Jul 15, 2021

csukuangfj commented Jul 15, 2021

pzelasko commented Jul 15, 2021

danpovey commented Jul 16, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csukuangfj commented Jul 19, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pzelasko commented Jul 19, 2021

csukuangfj commented Jul 20, 2021 • edited Loading

Input lexicon.txt

lexicon_disambig.txt

phones.txt

words.txt

L.fst

L_disambig.fst

danpovey commented Jul 20, 2021

csukuangfj commented Jul 21, 2021

danpovey commented Jul 21, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csukuangfj commented Jul 27, 2021

HLG - no LM rescoring

1-best decoding

n-best decoding

HLG - with LM rescoring

Whole lattice rescoring

n-best LM rescoring

danpovey commented Jul 27, 2021

pzelasko commented Jul 27, 2021

csukuangfj commented Jul 27, 2021

csukuangfj commented Jul 27, 2021

danpovey commented Jul 29, 2021 via email

csukuangfj commented Jul 31, 2021 • edited Loading

epoch-0

epoch-1

average of epoch 0 and 1

epoch-2

average of epoch 1 and 2

average of epoch 0, 1 and 2

epoch-3

average of epoch 2 and 3

average of epoch 1, 2 and 3

average of epoch 0, 1, 2 and 3

epoch 4

average of epoch 3 and 4

average of epoch 2, 3 and 4

average of epoch 1, 2, 3 and 4

average of epoch 0, 1, 2, 3 and 4

epoch 5

average of epoch 4 and 5

average of epoch 3, 4 and 5

average of epoch 2, 3, 4 and 5

average of epoch 1, 2, 3, 4 and 5

average of epoch 0, 1, 2, 3, 4 and 5

epoch 6

average of epoch 5 and 6

average of epoch 4, 5 and 6

average of epoch 3, 4, 5 and 6

average of epoch 2, 3, 4, 5 and 6

average of epoch 1, 2, 3, 4, 5 and 6

average of epoch 0, 1, 2, 3, 4, 5 and 6

epoch 7

average of epoch 6 and 7

average of epoch 5, 6 and 7

average of epoch 4, 5, 6 and 7

average of epoch 3, 4, 5, 6 and 7

average of epoch 2, 3, 4, 5, 6 and 7

average of epoch 1, 2, 3, 4, 5, 6 and 7

csukuangfj commented Jul 15, 2021 •

edited

Loading

csukuangfj commented Jul 20, 2021 •

edited

Loading

csukuangfj commented Jul 31, 2021 •

edited

Loading