[WIP] update bpe models and integrate 4-gram rescore #227

glynpu · 2021-07-05T04:26:19Z

A better model trained by (ctc + label_smooth_loss WIP: BPE Training ctc loss and label smooth loss #219) is released
4-gram rescore is integrated with refering to WIP: Add BPE training with LF-MMI. #215

Latest result with feat_batch_norm

	Wer% on test_clean	wer% on test_other
Encoder + ctc	2.98	(to be tested)
Encoder + (ctc + 3-gram) + 4-gram lattice rescore + (transformer decoder n-best rescore) num-paths-for-decoder-rescore=500	2.54	(to be tested)

Result witout feature_batch_norm

	Wer% on test_clean	wer% on test_other
Encoder + ctc	3.32	7.96
Encoder + (ctc + 3-gram) + 4-gram lattice rescore	2.92	*（failed when decoding, working on this)
Encoder + (ctc + 3-gram) + 4-gram lattice rescore + (transformer decoder n-best rescore) num-paths-for-decoder-rescore=100	2.87	*(to be tested)
Encoder + (ctc + 3-gram) + 4-gram lattice rescore + (transformer decoder n-best rescore) num-paths-for-decoder-rescore=500	2.86	*(to be tested)
+log_semering=False and remove repeated tokens	2.73	6.11

Wer result on test_clean:

glynpu · 2021-07-05T04:31:22Z

Here is the log when program crash while decoding test-other:

INFO:root:batch 1910, cuts processed until now is 1943/2939 (66.110922%)
INFO:root:batch 1920, cuts processed until now is 1953/2939 (66.451174%)
INFO:root:batch 1930, cuts processed until now is 1963/2939 (66.791426%)
[F] /ceph-ly/open-source/latest_k2/k2/k2/python/csrc/torch/torch_util.h:122:k2::Array1<U> k2::FromTorch(at::Tensor&) [with T = in
t] Check failed: tensor.strides()[0] == 1 (4 vs. 1) Expected stride: 1. Given: 4

[ Stack-Trace: ]
/ceph-ly/open-source/latest_k2/k2/build/lib/libk2_log.so(k2::internal::GetStackTrace()+0x5b) [0x7fd36a0f66ba]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x70a52) [0x7fd36b423a52]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0xb8c8f) [0x7fd36b46bc8f]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x1088bd) [0x7fd36b4bb8bd]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x102af4) [0x7fd36b4b5af4]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x11e695) [0x7fd36b4d1695]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x11db07) [0x7fd36b4d0b07]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x116d22) [0x7fd36b4c9d22]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x116f14) [0x7fd36b4c9f14]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x54187) [0x7fd36b407187]
python(PyCFunction_Call+0x56) [0x5ff8a6]
python(_PyObject_MakeTpCall+0x28f) [0x5fff6f]
python(_PyEval_EvalFrameDefault+0x5b9e) [0x57e35e]
python(_PyFunction_Vectorcall+0x19c) [0x602b2c]
python(PyVectorcall_Call+0x51) [0x5ff3b1]
/ceph-ly/py38/lib/python3.8/site-packages/torch/lib/libtorch_python.so(THPFunction_apply(_object*, _object*)+0x8fd) [0x7fd45ebdb7
8d]
python(PyCFunction_Call+0xfb) [0x5ff94b]
python(_PyObject_MakeTpCall+0x28f) [0x5fff6f]
python(_PyEval_EvalFrameDefault+0x5b9e) [0x57e35e]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x442) [0x602dd2]
python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]
python(_PyFunction_Vectorcall+0x19c) [0x602b2c]
python(_PyEval_EvalFrameDefault+0x53f0) [0x57dbb0]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x442) [0x602dd2]
python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]
python(_PyFunction_Vectorcall+0x19c) [0x602b2c]
python(PyVectorcall_Call+0x51) [0x5ff3b1]
python(_PyEval_EvalFrameDefault+0x1c4a) [0x57a40a]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x247) [0x602bd7]
python(_PyEval_EvalFrameDefault+0x619) [0x578dd9]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x442) [0x602dd2]
python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x442) [0x602dd2]
python(PyVectorcall_Call+0x51) [0x5ff3b1]
python(_PyEval_EvalFrameDefault+0x1c4a) [0x57a40a]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x442) [0x602dd2]
python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x247) [0x602bd7]
python(_PyEval_EvalFrameDefault+0x619) [0x578dd9]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python() [0x662c2e]
python(PyRun_FileExFlags+0x97) [0x662d07]
python(PyRun_SimpleFileExFlags+0x17f) [0x663a1f]

Traceback (most recent call last):
  File "bpe_ctc_att_conformer_decode.py", line 617, in <module>
  File "bpe_ctc_att_conformer_decode.py", line 576, in main
    model=model,
  File "/ceph-ly/py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "bpe_ctc_att_conformer_decode.py", line 278, in decode
    model=model,
  File "bpe_ctc_att_conformer_decode.py", line 240, in decode_one_batch
    lm_scale_list)
  File "/ceph-ly/py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/ceph-ly/open-source/to_submit/lattice_rescore_snwofall/snowfall/snowfall/decoding/lm_rescore.py", line 320, in rescore_w
ith_whole_lattice
    best_paths = k2.shortest_path(inv_lats, use_double_scores=True)
  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/fsa_algo.py", line 541, in shortest_path
    out_fsa = k2.utils.fsa_from_unary_function_tensor(fsa, ragged_arc, arc_map)
  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/utils.py", line 449, in fsa_from_unary_function_tensor
    setattr(dest, name, index_select(value, arc_map,
  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/ops.py", line 159, in index_select
    ans = _IndexSelectFunction.apply(src, index, default_value)
  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/ops.py", line 65, in forward
    return _k2.index_select(src, index, default_value)
RuntimeError: Some bad things happed.

csukuangfj · 2021-07-05T04:41:26Z

Here is the log when program crash while decoding test-other:


INFO:root:batch 1910, cuts processed until now is 1943/2939 (66.110922%)

INFO:root:batch 1920, cuts processed until now is 1953/2939 (66.451174%)

INFO:root:batch 1930, cuts processed until now is 1963/2939 (66.791426%)

[F] /ceph-ly/open-source/latest_k2/k2/k2/python/csrc/torch/torch_util.h:122:k2::Array1<U> k2::FromTorch(at::Tensor&) [with T = in

t] Check failed: tensor.strides()[0] == 1 (4 vs. 1) Expected stride: 1. Given: 4



[ Stack-Trace: ]

/ceph-ly/open-source/latest_k2/k2/build/lib/libk2_log.so(k2::internal::GetStackTrace()+0x5b) [0x7fd36a0f66ba]

/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x70a52) [0x7fd36b423a52]

/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0xb8c8f) [0x7fd36b46bc8f]

/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x1088bd) [0x7fd36b4bb8bd]

/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x102af4) [0x7fd36b4b5af4]

/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x11e695) [0x7fd36b4d1695]

/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x11db07) [0x7fd36b4d0b07]

/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x116d22) [0x7fd36b4c9d22]

/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x116f14) [0x7fd36b4c9f14]

/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x54187) [0x7fd36b407187]

python(PyCFunction_Call+0x56) [0x5ff8a6]

python(_PyObject_MakeTpCall+0x28f) [0x5fff6f]

python(_PyEval_EvalFrameDefault+0x5b9e) [0x57e35e]

python(_PyFunction_Vectorcall+0x19c) [0x602b2c]

python(PyVectorcall_Call+0x51) [0x5ff3b1]

/ceph-ly/py38/lib/python3.8/site-packages/torch/lib/libtorch_python.so(THPFunction_apply(_object*, _object*)+0x8fd) [0x7fd45ebdb7

8d]

python(PyCFunction_Call+0xfb) [0x5ff94b]

python(_PyObject_MakeTpCall+0x28f) [0x5fff6f]

python(_PyEval_EvalFrameDefault+0x5b9e) [0x57e35e]

python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]

python(_PyFunction_Vectorcall+0x442) [0x602dd2]

python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]

python(_PyFunction_Vectorcall+0x19c) [0x602b2c]

python(_PyEval_EvalFrameDefault+0x53f0) [0x57dbb0]

python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]

python(_PyFunction_Vectorcall+0x442) [0x602dd2]

python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]

python(_PyFunction_Vectorcall+0x19c) [0x602b2c]

python(PyVectorcall_Call+0x51) [0x5ff3b1]

python(_PyEval_EvalFrameDefault+0x1c4a) [0x57a40a]

python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]

python(_PyFunction_Vectorcall+0x247) [0x602bd7]

python(_PyEval_EvalFrameDefault+0x619) [0x578dd9]

python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]

python(_PyFunction_Vectorcall+0x442) [0x602dd2]

python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]

python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]

python(_PyFunction_Vectorcall+0x442) [0x602dd2]

python(PyVectorcall_Call+0x51) [0x5ff3b1]

python(_PyEval_EvalFrameDefault+0x1c4a) [0x57a40a]

python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]

python(_PyFunction_Vectorcall+0x442) [0x602dd2]

python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]

python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]

python(_PyFunction_Vectorcall+0x247) [0x602bd7]

python(_PyEval_EvalFrameDefault+0x619) [0x578dd9]

python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]

python() [0x662c2e]

python(PyRun_FileExFlags+0x97) [0x662d07]

python(PyRun_SimpleFileExFlags+0x17f) [0x663a1f]



Traceback (most recent call last):

  File "bpe_ctc_att_conformer_decode.py", line 617, in <module>

  File "bpe_ctc_att_conformer_decode.py", line 576, in main

    model=model,

  File "/ceph-ly/py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context

    return func(*args, **kwargs)

  File "bpe_ctc_att_conformer_decode.py", line 278, in decode

    model=model,

  File "bpe_ctc_att_conformer_decode.py", line 240, in decode_one_batch

    lm_scale_list)

  File "/ceph-ly/py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context

    return func(*args, **kwargs)

  File "/ceph-ly/open-source/to_submit/lattice_rescore_snwofall/snowfall/snowfall/decoding/lm_rescore.py", line 320, in rescore_w

ith_whole_lattice

    best_paths = k2.shortest_path(inv_lats, use_double_scores=True)

  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/fsa_algo.py", line 541, in shortest_path

    out_fsa = k2.utils.fsa_from_unary_function_tensor(fsa, ragged_arc, arc_map)

  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/utils.py", line 449, in fsa_from_unary_function_tensor

    setattr(dest, name, index_select(value, arc_map,

  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/ops.py", line 159, in index_select

    ans = _IndexSelectFunction.apply(src, index, default_value)

  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/ops.py", line 65, in forward

    return _k2.index_select(src, index, default_value)

RuntimeError: Some bad things happed.

Will have a look. Probably tomorrow.

danpovey · 2021-07-05T04:45:34Z

Can you find the code where it gets 'index' from? Possibly we failed to do clone() at some point to make it a stride-1 tensor if it came from an FSA (but it's still very odd). You may be able to replicate the failure in pdb and debug it that way (let me know by wechat if when run in pdb shows an error, because I may be able to remember the fix).

danpovey · 2021-07-05T04:48:27Z

The line numbers in utils.py don't seem to match with the current master.

danpovey · 2021-07-05T04:56:42Z

egs/librispeech/asr/simple_v1/bpe_ctc_att_conformer_decode.py

+from snowfall.training.mmi_graph import get_phone_symbols
+
+
+def nbest_decoding(lats: k2.Fsa, num_paths: int):


I assume this nbest_decoding function is still here as some kind of demo? It didn't help vs. just one-best, right?

I plan to combine this nbest_decoding with transformer-decoder nbest-rescore.
Now only encoder model is used, and transformer-decoder model may be used as a rescore "Language model".
I am still working on this.

danpovey · 2021-07-05T05:12:09Z

egs/librispeech/asr/simple_v1/bpe_run.sh

+if [ $stage -le 2 ]; then
+  dir=data/lang_bpe2
+  mkdir -p $dir
+  token_file=./data/en_token_list/bpe_unigram5000/tokens.txt


Where does this come from (data/en_token_list/bpe_unigram5000/tokens.txt)?

Currently they are downloaded from the snowfall_model_zoo together with neural net models.
Originally, They are trained by sentencepiece tokenizer which is also used by Espnet and #215.
To make it easier to be reviewed, this pr is mainly about decoding part.
Tokenizer training part will be summited with the model training part #219.

glynpu · 2021-07-06T02:40:19Z

Result of n-best rescore with transformer decoder:

	Wer% on test_clean	wer% on test_other
Encoder + ctc	3.32	7.96
Encoder + (ctc + 3-gram) + 4-gram lattice rescore	2.92	*（failed when decoding, working on this)
Encoder + (ctc + 3-gram) + 4-gram lattice rescore + (transformer decoder n-best rescore) num-paths-for-decoder-rescore=100	2.87	*(to be tested)
Encoder + (ctc + 3-gram) + 4-gram lattice rescore + (transformer decoder n-best rescore) num-paths-for-decoder-rescore=500	2.86	*(to be tested)

Detail errors

num-paths-for-decoder-rescore=100
INFO:root:[test-clean-lm_scale_0.6] %WER 2.87% [1510 / 52576, 207 ins, 130 del, 1173 sub ]
num-paths-for-decoder-rescore=500
INFO:root:[test-clean-lm_scale_0.6] %WER 2.86% [1505 / 52576, 207 ins, 128 del, 1170 sub ]

csukuangfj · 2021-07-06T02:41:40Z

egs/librispeech/asr/simple_v1/bpe_ctc_att_conformer_decode.py

+    paths = k2.random_paths(lats, num_paths=num_paths, use_double_scores=True)
+
+    # token_seqs/word_seqs is a k2.RaggedInt sharing the same shape as `paths`
+    # but it contains word IDs. Note that it also contains 0s and -1s.


Does only word_seqs contain word IDs?
I feel the whole sentence applies to both token_seqs and word_seqs since
you're using token_seqs/word_seqs.

Does only word_seqs contain word IDs?

yes.

I feel the whole sentence applies to both token_seqs and word_seqs since
you're using token_seqs/word_seqs.

you are right.
Sorry for the confusing statement.
token_seqs/word_seqs means (token_seqs or word_seqs)

csukuangfj · 2021-07-06T02:48:49Z

egs/librispeech/asr/simple_v1/bpe_ctc_att_conformer_decode.py

+    N-best rescore with transformer-decoder model.
+    The basic idea is to first extra n-best paths from the given lattice.
+    Then extract word_seqs and token_seqs for each path.
+    Compute the negative log-likehood for each token_seq as 'language model score', called decoder_scores.


Is it a typo here? Why is the log-likelihood negative?

Actually NOT a typo. It's computed by torch.nn.functional.cross_entropy whose result is negative log-likelihood.

According to the comment, decoder_scores is the negative log-likehood for each token_seq.
Can we remove negative?

danpovey · 2021-07-06T02:48:51Z

What is the LM scale? I would imagine that when using the transformer decoder, we'd need to scale down the LM probabilities, because that decoder would already account for the LM prob.

csukuangfj · 2021-07-06T02:51:53Z

egs/librispeech/asr/simple_v1/bpe_ctc_att_conformer_decode.py

+    fgram_lm_lats = k2.top_sort(k2.connect(fgram_lm_lats.to('cpu')).to(lats.device))
+    # am_scores is computed with log_semiring=True
+    # set log_semiring=True here to make fgram_lm_scores comparable to am_scores
+    fgram_tot_scores = fgram_lm_lats.get_tot_scores(use_double_scores=True, log_semiring=True)


Please see #214

The 2nd arg to get_tot_scores() here, representing log_semiring, should be false, because ARPA-type language models are constructed in such a way that the backoff prob is included in the direct arc. I.e. we would be double-counting if we were to sum the probabilities of the non-backoff and backoff arcs.

Have you tried to use log_semiring=False?

Not yet, will try it.

log_semiring=False is a little better than log_semiring=True （3.84% vs. 3.86%) with num_paths=500.

INFO:root:[test-clean-lm_scale_0.6] %WER 2.84% [1491 / 52576, 203 ins, 135 del, 1153 sub ]

- fgram_tot_scores = fgram_lm_lats.get_tot_scores(use_double_scores=True, log_semiring=True) + fgram_tot_scores = fgram_lm_lats.get_tot_scores(use_double_scores=True, log_semiring=False)

csukuangfj · 2021-07-06T02:53:44Z

egs/librispeech/asr/simple_v1/bpe_ctc_att_conformer_decode.py

+    nll = model.decoder_nll(encoder_memory, memory_mask, token_ids=token_ids)
+    assert nll.shape[0] == num_seqs
+    decoder_scores = - nll.sum(dim=1)
+    tot_scores = am_scores + fgram_lm_scores + decoder_scores


Could you try different weights for the three components of tot_scores?

Is there a recommanded range of these there weights?

I suggest trying different combinations?
For instance,

am_scale = 0.5 ngram_lm_scale = 0.3 nn_lm_scale = 1 - am_scale - ngram_lm_scale tot_scores = am_scale * am_scores + ngram_lm_scale * fgram_lm_scores + nn_lm_scale * decoder_scores

You may need to tune the scales for different kinds of scores.

glynpu · 2021-07-06T02:56:39Z

What is the LM scale?

currently no scale. as:

    tot_scores = am_scores + fgram_lm_scores + decoder_scores

we'd need to scale down the LM probabilities, because that decoder would already account for the LM prob.

Do you mean assign a weight less than one to lm_scores? like this:

-   tot_scores = am_scores + fgram_lm_scores + decoder_scores
+   lm_score_weight = 0.6 # just a value less than one
+   decoder_score_weight = 0.7 # just a value less than one
+   tot_scores = am_scores + lm_score_weight * fgram_lm_scores + decoder_score_weight * decoder_scores

csukuangfj · 2021-07-06T02:57:31Z

egs/librispeech/asr/simple_v1/bpe_ctc_att_conformer_decode.py

+
+    lats = k2.arc_sort(lats)
+    fgram_lm_lats = _intersect_device(lats, token_fsas_with_epsilon_loops, path_to_seq_map, sorted_match_a=True)
+    fgram_lm_lats = k2.top_sort(k2.connect(fgram_lm_lats.to('cpu')).to(lats.device))


Please update to the latest k2, which supports running k2.connect on CUDA.
You can use

fgram_lm_lats = k2.top_sort(k2.connect(fgram_lm_lats))

csukuangfj · 2021-07-06T03:03:20Z

egs/librispeech/asr/simple_v1/bpe_ctc_att_conformer_decode.py

+    num_seqs = len(token_ids)
+    time_steps = encoder_memory.shape[0]
+    feature_dim = encoder_memory.shape[2]
+    encoder_memory = encoder_memory.expand(time_steps, num_seqs, feature_dim)


Are this line and the following line can be removed? I think they are redundant
and are equivalent to a no-op.

NO.
before expand:
encoder_memory.shape = （time_steps, 1, feature_dim)
asfter expand:
encoder_memroy.shape = (time_steps, num_seqs, feature_dim)

(BTW, that's why my implementation only support batch_size=1, as I am figuring out a way to handle this encoder_memory)

csukuangfj · 2021-07-06T03:06:25Z

egs/librispeech/asr/simple_v1/bpe_ctc_att_conformer_decode.py

+    decoder_scores = - nll.sum(dim=1)
+    tot_scores = am_scores + fgram_lm_scores + decoder_scores
+    best_seq_idx = new2old[torch.argmax(tot_scores)]
+    best_word_seq = [k2.ragged.to_list(word_seqs)[0][best_seq_idx]]


Does it work when there are more than 1 sequences, i.e., when batch_size > 1?

not work because I am figuring out a way to handle encoder_memory.

csukuangfj · 2021-07-06T03:14:09Z

egs/librispeech/asr/simple_v1/bpe_ctc_att_conformer_decode.py

+    # `new2old` is a 1-D torch.Tensor mapping from the output path index
+    # to the input path index.
+    # new2old.numel() == unique_word_seqs.num_elements()
+    unique_token_seqs, _, new2old = k2.ragged.unique_sequences(


Could you use the approach we are using in the current master?

That is, use unique_word_seqs, not unique_token_seqs, to compute the lm_scores.

Different token seqs in unique_tokens_seqs may correspond to the same word seqs.
lm_scores is for word seqs, not token seqs.

will try it.
Now I use unique_token_seqs rather than unique_word_seqs because of following two reasons:

token_seq is always a 1-to-1 map to word_seq. These should not be many disambiguations.

transformer decoder is trained by token_seq. unique_token_seqs is already generated for transformer decoder, so I use it to get lm_scores.

Actually when you want to get word_seq from token_seq, just do:

word_seq = ''.join(token_seq).replace('_',' ')

token_seq is always a 1-to-1 map to word_seq. These should not be many disambiguations.

Are there epsilons (0s) in token seqs? Are there contiguous repeated tokens in token seqs?
token seqs from the above two cases can correspond to the same word seq, I think.

transformer decoder is trained by token_seq. unique_token_seqs is already generated for transformer decoder, so I use it to get lm_scores.

Is it possible to get the token seq from a word seq given the word piece model?

Fangjun is right that we should the unique_word_seqs, because even though it's a 1-1 map, that won't be obvious to k2.ragged.unique_sequences; many of them will really be repeats. When composing the LM with the CTC topo, we need to keep the "inner_labels" as an attribute, I believe compose() has an arg "inner_labels_name" or something like that that so the inner (matched) labels can be kept.

csukuangfj · 2021-07-06T03:16:56Z

Do you mean assign a weight less than one to lm_scores? like this:

tot_scores = am_scores + fgram_lm_scores + decoder_scores

lm_score_weight = 0.6 # just a value less than one

decoder_score_weight = 0.7 # just a value less than one

tot_scores = am_scores + lm_score_weight * fgram_lm_scores + decoder_score_weight * decoder_scores

I often see people using a combination of weights, whose sum is 1.

csukuangfj · 2021-07-06T03:22:15Z

egs/librispeech/asr/simple_v1/bpe_ctc_att_conformer_decode.py

+        '--avg',
+        type=int,
+        default=10,
+        help="Number of checkpionts to average. Automaticly select "


Suggested change

help="Number of checkpionts to average. Automaticly select "

help="Number of checkpionts to average. Automatically select "

…n_seqs

glynpu · 2021-07-08T07:39:24Z

compute am/4-gram lm_scores with unique_token_seqs seems a little better than that of unique_word_seqs, with a variety of combination of lm_scale and decoder_scale.

	Wer% on test_clean	wer% on test_other
Encoder + ctc	3.32	7.96
Encoder + (ctc + 3-gram) + 4-gram lattice rescore	2.92	*（failed when decoding, working on this)
+transformer decoder n-best rescore computing with *unique_word_seqs*	2.87	*(to be tested)
+transformer decoder n-best rescore computing with *unique_token_seqs*	2.81	*(to be tested)

wer of test_clean with compute_am_flm_scores_1, computing with unique_word_seqs.

decoder_scale(right)lm_scale(below)	0.01	0.03	0.05	0.08	0.09	0.1	0.3	0.5	0.7	0.9	1.0	2.0	4.0	6.0	8.0	10.0
0.1	3.04	3.02	3.02	2.99	3.0	2.99	3.07	3.17	3.25	3.33	3.33	3.52	3.6	3.66	3.69	3.71
0.3	2.96	2.94	2.94	2.94	2.93	2.94	3.05	3.17	3.24	3.31	3.34	3.52	3.61	3.66	3.69	3.72
0.5	2.93	2.91	2.89	2.88	*2.87*	2.89	3.04	3.14	3.27	3.33	3.36	3.52	3.62	3.67	3.7	3.71
0.6	2.91	2.89	2.88	2.89	2.88	2.89	3.04	3.16	3.26	3.34	3.37	3.53	3.62	3.67	3.7	3.72
0.7	2.93	2.93	2.91	2.91	2.9	2.9	3.06	3.16	3.28	3.33	3.36	3.53	3.62	3.67	3.71	3.73
0.9	3.14	3.1	3.09	3.06	3.05	3.05	3.13	3.24	3.31	3.37	3.4	3.55	3.65	3.69	3.72	3.74
1.0	3.33	3.28	3.25	3.2	3.21	3.21	3.21	3.29	3.37	3.4	3.43	3.59	3.67	3.7	3.74	3.74
2.0	5.63	5.56	5.53	5.47	5.45	5.43	5.06	4.68	4.39	4.18	4.11	3.82	3.8	3.8	3.81	3.8
4.0	6.13	6.11	6.1	6.1	6.09	6.08	5.97	5.84	5.69	5.56	5.49	4.75	4.06	3.92	3.87	3.86
6.0	6.23	6.22	6.22	6.2	6.21	6.21	6.15	6.08	5.99	5.91	5.89	5.44	4.65	4.14	3.96	3.92
8.0	6.3	6.3	6.28	6.28	6.28	6.27	6.23	6.19	6.13	6.09	6.04	5.79	5.08	4.61	4.22	4.02
10.0	6.32	6.32	6.31	6.31	6.31	6.31	6.27	6.24	6.2	6.16	6.14	5.93	5.42	4.9	4.58	4.27

wer of test_clean with compute_am_flm_scores_2,computing with unique_token_seqs.

decoder_scale(right)lm_scale(below)	0.01	0.03	0.05	0.08	0.09	0.1	0.3	0.5	0.7	0.9	1.0	2.0	4.0	6.0	8.0	10.0
0.1	3.02	3.0	2.98	2.95	2.94	2.94	2.9	2.87	2.88	2.89	2.88	2.89	2.91	2.92	2.93	2.94
0.3	2.97	2.95	2.93	2.91	2.9	2.9	2.85	2.86	2.86	2.85	2.86	2.89	2.9	2.93	2.93	2.94
0.5	2.92	2.92	2.91	2.88	2.88	2.88	2.85	2.82	2.83	2.85	2.85	2.88	2.91	2.93	2.94	2.94
0.6	2.92	2.89	2.9	2.88	2.86	2.86	2.84	2.83	2.83	2.84	2.85	2.88	2.92	2.93	2.94	2.94
0.7	2.94	2.93	2.93	2.9	2.9	2.89	2.82	2.82	2.83	2.84	2.84	2.89	2.92	2.93	2.94	2.94
0.9	3.14	3.11	3.07	3.01	3.0	2.99	2.88	2.82	*2.81*	2.82	2.84	2.89	2.93	2.94	2.94	2.94
1.0	3.3	3.25	3.19	3.14	3.12	3.11	2.91	2.85	2.83	2.82	2.83	2.89	2.93	2.94	2.94	2.95
2.0	5.53	5.48	5.45	5.38	5.35	5.33	4.7	4.11	3.72	3.5	3.39	2.97	2.93	2.94	2.94	2.95
4.0	6.09	6.08	6.06	6.05	6.05	6.04	5.86	5.6	5.29	4.98	4.85	3.95	3.14	2.98	2.94	2.95
6.0	6.19	6.19	6.19	6.17	6.18	6.17	6.08	5.94	5.79	5.61	5.5	4.67	3.76	3.25	3.02	2.98
8.0	6.25	6.25	6.25	6.25	6.24	6.23	6.16	6.08	5.97	5.87	5.81	5.18	4.18	3.67	3.3	3.09
10.0	6.28	6.28	6.27	6.27	6.27	6.26	6.21	6.15	6.08	5.99	5.94	5.48	4.54	3.99	3.63	3.36

log of compute_am_flm_scores_1:

lm_scale_0.5_decoder_scale_0.09	2.87	best for test-clean
lm_scale_0.5_decoder_scale_0.08	2.88
lm_scale_0.6_decoder_scale_0.05	2.88
lm_scale_0.6_decoder_scale_0.09	2.88
lm_scale_0.5_decoder_scale_0.1	2.89
lm_scale_0.5_decoder_scale_0.05	2.89
lm_scale_0.6_decoder_scale_0.1	2.89
lm_scale_0.6_decoder_scale_0.03	2.89
lm_scale_0.6_decoder_scale_0.08	2.89
lm_scale_0.7_decoder_scale_0.1	2.9
lm_scale_0.7_decoder_scale_0.09	2.9
lm_scale_0.5_decoder_scale_0.03	2.91
lm_scale_0.6_decoder_scale_0.01	2.91
lm_scale_0.7_decoder_scale_0.05	2.91
lm_scale_0.7_decoder_scale_0.08	2.91
lm_scale_0.3_decoder_scale_0.09	2.93
lm_scale_0.5_decoder_scale_0.01	2.93
lm_scale_0.7_decoder_scale_0.01	2.93
lm_scale_0.7_decoder_scale_0.03	2.93
lm_scale_0.3_decoder_scale_0.1	2.94
lm_scale_0.3_decoder_scale_0.03	2.94
lm_scale_0.3_decoder_scale_0.05	2.94
lm_scale_0.3_decoder_scale_0.08	2.94
lm_scale_0.3_decoder_scale_0.01	2.96
lm_scale_0.1_decoder_scale_0.1	2.99
lm_scale_0.1_decoder_scale_0.08	2.99
lm_scale_0.1_decoder_scale_0.09	3.0
lm_scale_0.1_decoder_scale_0.03	3.02
lm_scale_0.1_decoder_scale_0.05	3.02
lm_scale_0.1_decoder_scale_0.01	3.04
lm_scale_0.5_decoder_scale_0.3	3.04
lm_scale_0.6_decoder_scale_0.3	3.04
lm_scale_0.3_decoder_scale_0.3	3.05
lm_scale_0.9_decoder_scale_0.1	3.05
lm_scale_0.9_decoder_scale_0.09	3.05
lm_scale_0.7_decoder_scale_0.3	3.06
lm_scale_0.9_decoder_scale_0.08	3.06
lm_scale_0.1_decoder_scale_0.3	3.07
lm_scale_0.9_decoder_scale_0.05	3.09
lm_scale_0.9_decoder_scale_0.03	3.1
lm_scale_0.9_decoder_scale_0.3	3.13
lm_scale_0.5_decoder_scale_0.5	3.14
lm_scale_0.9_decoder_scale_0.01	3.14
lm_scale_0.6_decoder_scale_0.5	3.16
lm_scale_0.7_decoder_scale_0.5	3.16
lm_scale_0.1_decoder_scale_0.5	3.17
lm_scale_0.3_decoder_scale_0.5	3.17
lm_scale_1.0_decoder_scale_0.08	3.2
lm_scale_1.0_decoder_scale_0.1	3.21
lm_scale_1.0_decoder_scale_0.3	3.21
lm_scale_1.0_decoder_scale_0.09	3.21
lm_scale_0.3_decoder_scale_0.7	3.24
lm_scale_0.9_decoder_scale_0.5	3.24
lm_scale_0.1_decoder_scale_0.7	3.25
lm_scale_1.0_decoder_scale_0.05	3.25
lm_scale_0.6_decoder_scale_0.7	3.26
lm_scale_0.5_decoder_scale_0.7	3.27
lm_scale_0.7_decoder_scale_0.7	3.28
lm_scale_1.0_decoder_scale_0.03	3.28
lm_scale_1.0_decoder_scale_0.5	3.29
lm_scale_0.3_decoder_scale_0.9	3.31
lm_scale_0.9_decoder_scale_0.7	3.31
lm_scale_0.1_decoder_scale_0.9	3.33
lm_scale_0.1_decoder_scale_1.0	3.33
lm_scale_0.5_decoder_scale_0.9	3.33
lm_scale_0.7_decoder_scale_0.9	3.33
lm_scale_1.0_decoder_scale_0.01	3.33
lm_scale_0.3_decoder_scale_1.0	3.34
lm_scale_0.6_decoder_scale_0.9	3.34
lm_scale_0.5_decoder_scale_1.0	3.36
lm_scale_0.7_decoder_scale_1.0	3.36
lm_scale_0.6_decoder_scale_1.0	3.37
lm_scale_0.9_decoder_scale_0.9	3.37
lm_scale_1.0_decoder_scale_0.7	3.37
lm_scale_0.9_decoder_scale_1.0	3.4
lm_scale_1.0_decoder_scale_0.9	3.4
lm_scale_1.0_decoder_scale_1.0	3.43
lm_scale_0.1_decoder_scale_2.0	3.52
lm_scale_0.3_decoder_scale_2.0	3.52
lm_scale_0.5_decoder_scale_2.0	3.52
lm_scale_0.6_decoder_scale_2.0	3.53
lm_scale_0.7_decoder_scale_2.0	3.53
lm_scale_0.9_decoder_scale_2.0	3.55
lm_scale_1.0_decoder_scale_2.0	3.59
lm_scale_0.1_decoder_scale_4.0	3.6
lm_scale_0.3_decoder_scale_4.0	3.61
lm_scale_0.5_decoder_scale_4.0	3.62
lm_scale_0.6_decoder_scale_4.0	3.62
lm_scale_0.7_decoder_scale_4.0	3.62
lm_scale_0.9_decoder_scale_4.0	3.65
lm_scale_0.1_decoder_scale_6.0	3.66
lm_scale_0.3_decoder_scale_6.0	3.66
lm_scale_0.5_decoder_scale_6.0	3.67
lm_scale_0.6_decoder_scale_6.0	3.67
lm_scale_0.7_decoder_scale_6.0	3.67
lm_scale_1.0_decoder_scale_4.0	3.67
lm_scale_0.1_decoder_scale_8.0	3.69
lm_scale_0.3_decoder_scale_8.0	3.69
lm_scale_0.9_decoder_scale_6.0	3.69
lm_scale_0.5_decoder_scale_8.0	3.7
lm_scale_0.6_decoder_scale_8.0	3.7
lm_scale_1.0_decoder_scale_6.0	3.7
lm_scale_0.1_decoder_scale_10.0	3.71
lm_scale_0.5_decoder_scale_10.0	3.71
lm_scale_0.7_decoder_scale_8.0	3.71
lm_scale_0.3_decoder_scale_10.0	3.72
lm_scale_0.6_decoder_scale_10.0	3.72
lm_scale_0.9_decoder_scale_8.0	3.72
lm_scale_0.7_decoder_scale_10.0	3.73
lm_scale_0.9_decoder_scale_10.0	3.74
lm_scale_1.0_decoder_scale_8.0	3.74
lm_scale_1.0_decoder_scale_10.0	3.74
lm_scale_2.0_decoder_scale_4.0	3.8
lm_scale_2.0_decoder_scale_6.0	3.8
lm_scale_2.0_decoder_scale_10.0	3.8
lm_scale_2.0_decoder_scale_8.0	3.81
lm_scale_2.0_decoder_scale_2.0	3.82
lm_scale_4.0_decoder_scale_10.0	3.86
lm_scale_4.0_decoder_scale_8.0	3.87
lm_scale_4.0_decoder_scale_6.0	3.92
lm_scale_6.0_decoder_scale_10.0	3.92
lm_scale_6.0_decoder_scale_8.0	3.96
lm_scale_8.0_decoder_scale_10.0	4.02
lm_scale_4.0_decoder_scale_4.0	4.06
lm_scale_2.0_decoder_scale_1.0	4.11
lm_scale_6.0_decoder_scale_6.0	4.14
lm_scale_2.0_decoder_scale_0.9	4.18
lm_scale_8.0_decoder_scale_8.0	4.22
lm_scale_10.0_decoder_scale_10.0	4.27
lm_scale_2.0_decoder_scale_0.7	4.39
lm_scale_10.0_decoder_scale_8.0	4.58
lm_scale_8.0_decoder_scale_6.0	4.61
lm_scale_6.0_decoder_scale_4.0	4.65
lm_scale_2.0_decoder_scale_0.5	4.68
lm_scale_4.0_decoder_scale_2.0	4.75
lm_scale_10.0_decoder_scale_6.0	4.9
lm_scale_2.0_decoder_scale_0.3	5.06
lm_scale_8.0_decoder_scale_4.0	5.08
lm_scale_10.0_decoder_scale_4.0	5.42
lm_scale_2.0_decoder_scale_0.1	5.43
lm_scale_6.0_decoder_scale_2.0	5.44
lm_scale_2.0_decoder_scale_0.09	5.45
lm_scale_2.0_decoder_scale_0.08	5.47
lm_scale_4.0_decoder_scale_1.0	5.49
lm_scale_2.0_decoder_scale_0.05	5.53
lm_scale_2.0_decoder_scale_0.03	5.56
lm_scale_4.0_decoder_scale_0.9	5.56
lm_scale_2.0_decoder_scale_0.01	5.63
lm_scale_4.0_decoder_scale_0.7	5.69
lm_scale_8.0_decoder_scale_2.0	5.79
lm_scale_4.0_decoder_scale_0.5	5.84
lm_scale_6.0_decoder_scale_1.0	5.89
lm_scale_6.0_decoder_scale_0.9	5.91
lm_scale_10.0_decoder_scale_2.0	5.93
lm_scale_4.0_decoder_scale_0.3	5.97
lm_scale_6.0_decoder_scale_0.7	5.99
lm_scale_8.0_decoder_scale_1.0	6.04
lm_scale_4.0_decoder_scale_0.1	6.08
lm_scale_6.0_decoder_scale_0.5	6.08
lm_scale_4.0_decoder_scale_0.09	6.09
lm_scale_8.0_decoder_scale_0.9	6.09
lm_scale_4.0_decoder_scale_0.05	6.1
lm_scale_4.0_decoder_scale_0.08	6.1
lm_scale_4.0_decoder_scale_0.03	6.11
lm_scale_4.0_decoder_scale_0.01	6.13
lm_scale_8.0_decoder_scale_0.7	6.13
lm_scale_10.0_decoder_scale_1.0	6.14
lm_scale_6.0_decoder_scale_0.3	6.15
lm_scale_10.0_decoder_scale_0.9	6.16
lm_scale_8.0_decoder_scale_0.5	6.19
lm_scale_6.0_decoder_scale_0.08	6.2
lm_scale_10.0_decoder_scale_0.7	6.2
lm_scale_6.0_decoder_scale_0.1	6.21
lm_scale_6.0_decoder_scale_0.09	6.21
lm_scale_6.0_decoder_scale_0.03	6.22
lm_scale_6.0_decoder_scale_0.05	6.22
lm_scale_6.0_decoder_scale_0.01	6.23
lm_scale_8.0_decoder_scale_0.3	6.23
lm_scale_10.0_decoder_scale_0.5	6.24
lm_scale_8.0_decoder_scale_0.1	6.27
lm_scale_10.0_decoder_scale_0.3	6.27
lm_scale_8.0_decoder_scale_0.05	6.28
lm_scale_8.0_decoder_scale_0.08	6.28
lm_scale_8.0_decoder_scale_0.09	6.28
lm_scale_8.0_decoder_scale_0.01	6.3
lm_scale_8.0_decoder_scale_0.03	6.3
lm_scale_10.0_decoder_scale_0.1	6.31
lm_scale_10.0_decoder_scale_0.05	6.31
lm_scale_10.0_decoder_scale_0.08	6.31
lm_scale_10.0_decoder_scale_0.09	6.31
lm_scale_10.0_decoder_scale_0.01	6.32
lm_scale_10.0_decoder_scale_0.03	6.32

log of compute_am_flm_scores_2

lm_scale_0.9_decoder_scale_0.7	2.81	best for test-clean
lm_scale_0.5_decoder_scale_0.5	2.82
lm_scale_0.7_decoder_scale_0.3	2.82
lm_scale_0.7_decoder_scale_0.5	2.82
lm_scale_0.9_decoder_scale_0.5	2.82
lm_scale_0.9_decoder_scale_0.9	2.82
lm_scale_1.0_decoder_scale_0.9	2.82
lm_scale_0.5_decoder_scale_0.7	2.83
lm_scale_0.6_decoder_scale_0.5	2.83
lm_scale_0.6_decoder_scale_0.7	2.83
lm_scale_0.7_decoder_scale_0.7	2.83
lm_scale_1.0_decoder_scale_0.7	2.83
lm_scale_1.0_decoder_scale_1.0	2.83
lm_scale_0.6_decoder_scale_0.3	2.84
lm_scale_0.6_decoder_scale_0.9	2.84
lm_scale_0.7_decoder_scale_0.9	2.84
lm_scale_0.7_decoder_scale_1.0	2.84
lm_scale_0.9_decoder_scale_1.0	2.84
lm_scale_0.3_decoder_scale_0.3	2.85
lm_scale_0.3_decoder_scale_0.9	2.85
lm_scale_0.5_decoder_scale_0.3	2.85
lm_scale_0.5_decoder_scale_0.9	2.85
lm_scale_0.5_decoder_scale_1.0	2.85
lm_scale_0.6_decoder_scale_1.0	2.85
lm_scale_1.0_decoder_scale_0.5	2.85
lm_scale_0.3_decoder_scale_0.5	2.86
lm_scale_0.3_decoder_scale_0.7	2.86
lm_scale_0.3_decoder_scale_1.0	2.86
lm_scale_0.6_decoder_scale_0.1	2.86
lm_scale_0.6_decoder_scale_0.09	2.86
lm_scale_0.1_decoder_scale_0.5	2.87
lm_scale_0.1_decoder_scale_0.7	2.88
lm_scale_0.1_decoder_scale_1.0	2.88
lm_scale_0.5_decoder_scale_0.1	2.88
lm_scale_0.5_decoder_scale_2.0	2.88
lm_scale_0.5_decoder_scale_0.08	2.88
lm_scale_0.5_decoder_scale_0.09	2.88
lm_scale_0.6_decoder_scale_2.0	2.88
lm_scale_0.6_decoder_scale_0.08	2.88
lm_scale_0.9_decoder_scale_0.3	2.88
lm_scale_0.1_decoder_scale_0.9	2.89
lm_scale_0.1_decoder_scale_2.0	2.89
lm_scale_0.3_decoder_scale_2.0	2.89
lm_scale_0.6_decoder_scale_0.03	2.89
lm_scale_0.7_decoder_scale_0.1	2.89
lm_scale_0.7_decoder_scale_2.0	2.89
lm_scale_0.9_decoder_scale_2.0	2.89
lm_scale_1.0_decoder_scale_2.0	2.89
lm_scale_0.1_decoder_scale_0.3	2.9
lm_scale_0.3_decoder_scale_0.1	2.9
lm_scale_0.3_decoder_scale_4.0	2.9
lm_scale_0.3_decoder_scale_0.09	2.9
lm_scale_0.6_decoder_scale_0.05	2.9
lm_scale_0.7_decoder_scale_0.08	2.9
lm_scale_0.7_decoder_scale_0.09	2.9
lm_scale_0.1_decoder_scale_4.0	2.91
lm_scale_0.3_decoder_scale_0.08	2.91
lm_scale_0.5_decoder_scale_4.0	2.91
lm_scale_0.5_decoder_scale_0.05	2.91
lm_scale_1.0_decoder_scale_0.3	2.91
lm_scale_0.1_decoder_scale_6.0	2.92
lm_scale_0.5_decoder_scale_0.01	2.92
lm_scale_0.5_decoder_scale_0.03	2.92
lm_scale_0.6_decoder_scale_4.0	2.92
lm_scale_0.6_decoder_scale_0.01	2.92
lm_scale_0.7_decoder_scale_4.0	2.92
lm_scale_0.1_decoder_scale_8.0	2.93
lm_scale_0.3_decoder_scale_6.0	2.93
lm_scale_0.3_decoder_scale_8.0	2.93
lm_scale_0.3_decoder_scale_0.05	2.93
lm_scale_0.5_decoder_scale_6.0	2.93
lm_scale_0.6_decoder_scale_6.0	2.93
lm_scale_0.7_decoder_scale_6.0	2.93
lm_scale_0.7_decoder_scale_0.03	2.93
lm_scale_0.7_decoder_scale_0.05	2.93
lm_scale_0.9_decoder_scale_4.0	2.93
lm_scale_1.0_decoder_scale_4.0	2.93
lm_scale_2.0_decoder_scale_4.0	2.93
lm_scale_0.1_decoder_scale_0.1	2.94
lm_scale_0.1_decoder_scale_10.0	2.94
lm_scale_0.1_decoder_scale_0.09	2.94
lm_scale_0.3_decoder_scale_10.0	2.94
lm_scale_0.5_decoder_scale_8.0	2.94
lm_scale_0.5_decoder_scale_10.0	2.94
lm_scale_0.6_decoder_scale_8.0	2.94
lm_scale_0.6_decoder_scale_10.0	2.94
lm_scale_0.7_decoder_scale_8.0	2.94
lm_scale_0.7_decoder_scale_10.0	2.94
lm_scale_0.7_decoder_scale_0.01	2.94
lm_scale_0.9_decoder_scale_6.0	2.94
lm_scale_0.9_decoder_scale_8.0	2.94
lm_scale_0.9_decoder_scale_10.0	2.94
lm_scale_1.0_decoder_scale_6.0	2.94
lm_scale_1.0_decoder_scale_8.0	2.94
lm_scale_2.0_decoder_scale_6.0	2.94
lm_scale_2.0_decoder_scale_8.0	2.94
lm_scale_4.0_decoder_scale_8.0	2.94
lm_scale_0.1_decoder_scale_0.08	2.95
lm_scale_0.3_decoder_scale_0.03	2.95
lm_scale_1.0_decoder_scale_10.0	2.95
lm_scale_2.0_decoder_scale_10.0	2.95
lm_scale_4.0_decoder_scale_10.0	2.95
lm_scale_0.3_decoder_scale_0.01	2.97
lm_scale_2.0_decoder_scale_2.0	2.97
lm_scale_0.1_decoder_scale_0.05	2.98
lm_scale_4.0_decoder_scale_6.0	2.98
lm_scale_6.0_decoder_scale_10.0	2.98
lm_scale_0.9_decoder_scale_0.1	2.99
lm_scale_0.1_decoder_scale_0.03	3.0
lm_scale_0.9_decoder_scale_0.09	3.0
lm_scale_0.9_decoder_scale_0.08	3.01
lm_scale_0.1_decoder_scale_0.01	3.02
lm_scale_6.0_decoder_scale_8.0	3.02
lm_scale_0.9_decoder_scale_0.05	3.07
lm_scale_8.0_decoder_scale_10.0	3.09
lm_scale_0.9_decoder_scale_0.03	3.11
lm_scale_1.0_decoder_scale_0.1	3.11
lm_scale_1.0_decoder_scale_0.09	3.12
lm_scale_0.9_decoder_scale_0.01	3.14
lm_scale_1.0_decoder_scale_0.08	3.14
lm_scale_4.0_decoder_scale_4.0	3.14
lm_scale_1.0_decoder_scale_0.05	3.19
lm_scale_1.0_decoder_scale_0.03	3.25
lm_scale_6.0_decoder_scale_6.0	3.25
lm_scale_1.0_decoder_scale_0.01	3.3
lm_scale_8.0_decoder_scale_8.0	3.3
lm_scale_10.0_decoder_scale_10.0	3.36
lm_scale_2.0_decoder_scale_1.0	3.39
lm_scale_2.0_decoder_scale_0.9	3.5
lm_scale_10.0_decoder_scale_8.0	3.63
lm_scale_8.0_decoder_scale_6.0	3.67
lm_scale_2.0_decoder_scale_0.7	3.72
lm_scale_6.0_decoder_scale_4.0	3.76
lm_scale_4.0_decoder_scale_2.0	3.95
lm_scale_10.0_decoder_scale_6.0	3.99
lm_scale_2.0_decoder_scale_0.5	4.11
lm_scale_8.0_decoder_scale_4.0	4.18
lm_scale_10.0_decoder_scale_4.0	4.54
lm_scale_6.0_decoder_scale_2.0	4.67
lm_scale_2.0_decoder_scale_0.3	4.7
lm_scale_4.0_decoder_scale_1.0	4.85
lm_scale_4.0_decoder_scale_0.9	4.98
lm_scale_8.0_decoder_scale_2.0	5.18
lm_scale_4.0_decoder_scale_0.7	5.29
lm_scale_2.0_decoder_scale_0.1	5.33
lm_scale_2.0_decoder_scale_0.09	5.35
lm_scale_2.0_decoder_scale_0.08	5.38
lm_scale_2.0_decoder_scale_0.05	5.45
lm_scale_2.0_decoder_scale_0.03	5.48
lm_scale_10.0_decoder_scale_2.0	5.48
lm_scale_6.0_decoder_scale_1.0	5.5
lm_scale_2.0_decoder_scale_0.01	5.53
lm_scale_4.0_decoder_scale_0.5	5.6
lm_scale_6.0_decoder_scale_0.9	5.61
lm_scale_6.0_decoder_scale_0.7	5.79
lm_scale_8.0_decoder_scale_1.0	5.81
lm_scale_4.0_decoder_scale_0.3	5.86
lm_scale_8.0_decoder_scale_0.9	5.87
lm_scale_6.0_decoder_scale_0.5	5.94
lm_scale_10.0_decoder_scale_1.0	5.94
lm_scale_8.0_decoder_scale_0.7	5.97
lm_scale_10.0_decoder_scale_0.9	5.99
lm_scale_4.0_decoder_scale_0.1	6.04
lm_scale_4.0_decoder_scale_0.08	6.05
lm_scale_4.0_decoder_scale_0.09	6.05
lm_scale_4.0_decoder_scale_0.05	6.06
lm_scale_4.0_decoder_scale_0.03	6.08
lm_scale_6.0_decoder_scale_0.3	6.08
lm_scale_8.0_decoder_scale_0.5	6.08
lm_scale_10.0_decoder_scale_0.7	6.08
lm_scale_4.0_decoder_scale_0.01	6.09
lm_scale_10.0_decoder_scale_0.5	6.15
lm_scale_8.0_decoder_scale_0.3	6.16
lm_scale_6.0_decoder_scale_0.1	6.17
lm_scale_6.0_decoder_scale_0.08	6.17
lm_scale_6.0_decoder_scale_0.09	6.18
lm_scale_6.0_decoder_scale_0.01	6.19
lm_scale_6.0_decoder_scale_0.03	6.19
lm_scale_6.0_decoder_scale_0.05	6.19
lm_scale_10.0_decoder_scale_0.3	6.21
lm_scale_8.0_decoder_scale_0.1	6.23
lm_scale_8.0_decoder_scale_0.09	6.24
lm_scale_8.0_decoder_scale_0.01	6.25
lm_scale_8.0_decoder_scale_0.03	6.25
lm_scale_8.0_decoder_scale_0.05	6.25
lm_scale_8.0_decoder_scale_0.08	6.25
lm_scale_10.0_decoder_scale_0.1	6.26
lm_scale_10.0_decoder_scale_0.05	6.27
lm_scale_10.0_decoder_scale_0.08	6.27
lm_scale_10.0_decoder_scale_0.09	6.27
lm_scale_10.0_decoder_scale_0.01	6.28
lm_scale_10.0_decoder_scale_0.03	6.28

danpovey · 2021-07-11T06:25:56Z

I just want to make sure you know how to get the unique token sequences from paths in the FSA. (Not sure if this is
something that needs fixing, sorry).
By unique token sequences I mean without the repeats that come from the CTC, topo, or the epsilons.
The way to do this is to use inner_labels='tokens' or something like that when doing the composition with the CTC
topo during graph construction, and then use fsa.tokens to obtain these from the lattices when you need them. Any other way may not be correct if we are using the new/simplified CTC topo, because any repeats of the same token will be converted into a single token, so certain words or word-sequences might become impossible to recognize.

csukuangfj · 2021-07-12T03:42:07Z

Here is the log when program crash while decoding test-other:

INFO:root:batch 1910, cuts processed until now is 1943/2939 (66.110922%)
INFO:root:batch 1920, cuts processed until now is 1953/2939 (66.451174%)
INFO:root:batch 1930, cuts processed until now is 1963/2939 (66.791426%)
[F] /ceph-ly/open-source/latest_k2/k2/k2/python/csrc/torch/torch_util.h:122:k2::Array1<U> k2::FromTorch(at::Tensor&) [with T = in
t] Check failed: tensor.strides()[0] == 1 (4 vs. 1) Expected stride: 1. Given: 4

[ Stack-Trace: ]
/ceph-ly/open-source/latest_k2/k2/build/lib/libk2_log.so(k2::internal::GetStackTrace()+0x5b) [0x7fd36a0f66ba]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x70a52) [0x7fd36b423a52]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0xb8c8f) [0x7fd36b46bc8f]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x1088bd) [0x7fd36b4bb8bd]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x102af4) [0x7fd36b4b5af4]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x11e695) [0x7fd36b4d1695]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x11db07) [0x7fd36b4d0b07]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x116d22) [0x7fd36b4c9d22]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x116f14) [0x7fd36b4c9f14]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x54187) [0x7fd36b407187]
python(PyCFunction_Call+0x56) [0x5ff8a6]
python(_PyObject_MakeTpCall+0x28f) [0x5fff6f]
python(_PyEval_EvalFrameDefault+0x5b9e) [0x57e35e]
python(_PyFunction_Vectorcall+0x19c) [0x602b2c]
python(PyVectorcall_Call+0x51) [0x5ff3b1]
/ceph-ly/py38/lib/python3.8/site-packages/torch/lib/libtorch_python.so(THPFunction_apply(_object*, _object*)+0x8fd) [0x7fd45ebdb7
8d]
python(PyCFunction_Call+0xfb) [0x5ff94b]
python(_PyObject_MakeTpCall+0x28f) [0x5fff6f]
python(_PyEval_EvalFrameDefault+0x5b9e) [0x57e35e]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x442) [0x602dd2]
python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]
python(_PyFunction_Vectorcall+0x19c) [0x602b2c]
python(_PyEval_EvalFrameDefault+0x53f0) [0x57dbb0]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x442) [0x602dd2]
python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]
python(_PyFunction_Vectorcall+0x19c) [0x602b2c]
python(PyVectorcall_Call+0x51) [0x5ff3b1]
python(_PyEval_EvalFrameDefault+0x1c4a) [0x57a40a]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x247) [0x602bd7]
python(_PyEval_EvalFrameDefault+0x619) [0x578dd9]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x442) [0x602dd2]
python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x442) [0x602dd2]
python(PyVectorcall_Call+0x51) [0x5ff3b1]
python(_PyEval_EvalFrameDefault+0x1c4a) [0x57a40a]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x442) [0x602dd2]
python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x247) [0x602bd7]
python(_PyEval_EvalFrameDefault+0x619) [0x578dd9]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python() [0x662c2e]
python(PyRun_FileExFlags+0x97) [0x662d07]
python(PyRun_SimpleFileExFlags+0x17f) [0x663a1f]

Traceback (most recent call last):
  File "bpe_ctc_att_conformer_decode.py", line 617, in <module>
  File "bpe_ctc_att_conformer_decode.py", line 576, in main
    model=model,
  File "/ceph-ly/py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "bpe_ctc_att_conformer_decode.py", line 278, in decode
    model=model,
  File "bpe_ctc_att_conformer_decode.py", line 240, in decode_one_batch
    lm_scale_list)
  File "/ceph-ly/py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/ceph-ly/open-source/to_submit/lattice_rescore_snwofall/snowfall/snowfall/decoding/lm_rescore.py", line 320, in rescore_w
ith_whole_lattice
    best_paths = k2.shortest_path(inv_lats, use_double_scores=True)
  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/fsa_algo.py", line 541, in shortest_path
    out_fsa = k2.utils.fsa_from_unary_function_tensor(fsa, ragged_arc, arc_map)
  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/utils.py", line 449, in fsa_from_unary_function_tensor
    setattr(dest, name, index_select(value, arc_map,
  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/ops.py", line 159, in index_select
    ans = _IndexSelectFunction.apply(src, index, default_value)
  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/ops.py", line 65, in forward
    return _k2.index_select(src, index, default_value)
RuntimeError: Some bad things happed.

Did you use a batch size of 1? If your decoding result is an empty FSA, you will encounter this kind of error
when calling k2.shortest_path. The solution is to return rescoring_lats directly.

snowfall/snowfall/decoding/lm_rescore.py

Line 306 in 5c979cc

inv_lats = k2.invert(rescoring_lats)

The reason is that the following line
https://github.com/k2-fsa/k2/blob/069425e301472e7ea31ea982ba2a943ac5fcb649/k2/python/k2/fsa.py#L894

            if src_name == 'labels':
                value = value.clone()

returns a tensor with stride == 4 if value is empty.

danpovey · 2021-07-12T08:38:58Z

We should modify the code that crashes to be insensitive to the stride if any of the dims is zero. Kangwei, perhaps you could do that?

pkufool · 2021-07-12T08:51:39Z

We should modify the code that crashes to be insensitive to the stride if any of the dims is zero. Kangwei, perhaps you could do that?

Sure.

glynpu · 2021-07-12T09:40:45Z

I just want to make sure you know how to get the unique token sequences from paths in the FSA. (Not sure if this is
something that needs fixing, sorry).

After removing repeat tokens and use log_semiring=False, wer on test-clean decrease from 2.81(last week) to 2.73(now).

details result with different scale combination:

decoder_scale(right)lm_scale(below)	0.1	0.3	0.5	0.6	0.7	0.9	1.0	1.1	1.2	1.3	1.5	1.7	1.9	2.0
0.1	2.98	2.95	2.92	2.9	2.9	2.89	2.89	2.88	2.87	2.86	2.85	2.85	2.85	2.84
0.3	2.91	2.88	2.88	2.88	2.87	2.87	2.85	2.85	2.85	2.85	2.84	2.84	2.83	2.83
0.5	2.88	2.86	2.83	2.84	2.84	2.84	2.83	2.84	2.83	2.82	2.82	2.83	2.83	2.83
0.6	2.86	2.82	2.82	2.81	2.82	2.82	2.82	2.82	2.82	2.81	2.81	2.82	2.82	2.82
0.7	2.87	2.8	2.78	2.79	2.8	2.81	2.81	2.8	2.8	2.8	2.8	2.82	2.82	2.82
0.9	2.99	2.84	2.78	2.76	2.77	2.76	2.76	2.76	2.77	2.78	2.79	2.79	2.8	2.8
1.0	3.12	2.89	2.8	2.77	2.77	2.75	2.74	2.74	2.76	2.77	2.78	2.79	2.79	2.79
1.1	3.32	3.0	2.82	2.8	2.77	2.74	2.73	2.74	*2.73*	2.74	2.77	2.78	2.78	2.78
1.2	3.58	3.13	2.9	2.85	2.8	2.77	2.74	2.74	2.73	2.74	2.73	2.76	2.77	2.77
1.3	3.87	3.3	3.0	2.92	2.87	2.79	2.76	2.77	2.75	2.74	2.74	2.74	2.75	2.76
1.5	4.45	3.78	3.28	3.17	3.03	2.88	2.85	2.82	2.78	2.77	2.74	2.73	2.74	*2.73*
1.7	4.84	4.24	3.76	3.54	3.31	3.06	2.99	2.93	2.88	2.84	2.8	2.77	2.75	2.75
1.9	5.11	4.65	4.15	3.95	3.73	3.33	3.2	3.12	3.03	2.98	2.88	2.84	2.8	2.79
2.0	5.19	4.81	4.37	4.11	3.92	3.54	3.34	3.23	3.13	3.05	2.95	2.88	2.83	2.81

glynpu · 2021-07-13T09:39:21Z

Result of batch_size > 1 is a little than that of batch_size == 1, with 2.74 > 2.73.
And the lowest wer is obtained with different lm_scale/decoder_scale setting.

Detail results:

decoder_scale(right)lm_scale(below)	0.1	0.3	0.5	0.6	0.7	0.9	1.0	1.1	1.2	1.3	1.5	1.7	1.9	2.0
0.1	2.99	2.98	2.94	2.92	2.92	2.92	2.91	2.91	2.9	2.9	2.89	2.89	2.89	2.89
0.3	2.9	2.9	2.9	2.9	2.9	2.89	2.88	2.87	2.88	2.88	2.86	2.86	2.86	2.86
0.5	2.88	2.85	2.85	2.87	2.86	2.85	2.85	2.86	2.85	2.85	2.85	2.85	2.86	2.86
0.6	2.86	2.83	2.82	2.82	2.84	2.84	2.84	2.84	2.85	2.85	2.85	2.85	2.86	2.86
0.7	2.86	2.81	2.79	2.8	2.81	2.83	2.83	2.83	2.84	2.84	2.85	2.86	2.85	2.85
0.9	2.98	2.84	2.79	2.76	2.77	2.78	2.78	2.8	2.81	2.82	2.82	2.82	2.83	2.84
1.0	3.12	2.88	2.81	2.79	2.77	2.76	2.76	2.78	2.79	2.81	2.82	2.81	2.82	2.82
1.1	3.31	3.0	2.83	2.81	2.79	2.76	2.75	2.75	2.75	2.77	2.8	2.8	2.81	2.81
1.2	3.59	3.13	2.9	2.85	2.81	2.79	2.77	2.76	2.75	2.76	2.76	2.79	2.8	2.8
1.3	3.87	3.3	3.01	2.93	2.87	2.79	2.78	2.79	2.77	2.76	2.76	2.77	2.78	2.79
1.5	4.43	3.81	3.29	3.17	3.05	2.9	2.87	2.84	2.8	2.78	2.77	*2.74*	2.75	2.75
1.7	4.86	4.28	3.79	3.56	3.32	3.07	3.0	2.95	2.89	2.87	2.82	2.79	2.78	2.77
1.9	5.15	4.68	4.17	3.96	3.74	3.33	3.21	3.13	3.04	2.99	2.88	2.85	2.82	2.81
2.0	5.22	4.83	4.37	4.13	3.92	3.55	3.34	3.24	3.14	3.07	2.95	2.87	2.84	2.82

glynpu · 2021-07-13T12:03:51Z

As suggested by fangjun, the crash when decoding test-other is solved by batch_size > 1.
Current results are:

	Wer% on test_clean	wer% on test_other
Encoder + ctc	3.32	7.96
Encoder + (ctc + 3-gram) + 4-gram lattice rescore	2.92	*（to be tested)
Encoder + (ctc + 3-gram) + 4-gram lattice rescore + (transformer decoder n-best rescore) num-paths-for-decoder-rescore=100	2.87	*(to be tested)
Encoder + (ctc + 3-gram) + 4-gram lattice rescore + (transformer decoder n-best rescore) num-paths-for-decoder-rescore=500	2.86	*(to be tested)
+log_semering=False and remove repeated tokens	2.73	6.11

danpovey · 2021-07-13T17:13:19Z

Fantastic! I don't think those small differences in WER are significant, likely just noise.

…

On Tue, Jul 13, 2021 at 8:04 PM LIyong.Guo ***@***.***> wrote: As suggested by fangjun, the crash when decode test-other is solved by batch_size > 1. Current results are: Wer% on test_clean wer% on test_other Encoder + ctc 3.32 7.96 Encoder + (ctc + 3-gram) + 4-gram lattice rescore 2.92 *（to be tested) Encoder + (ctc + 3-gram) + 4-gram lattice rescore + (transformer decoder n-best rescore) num-paths-for-decoder-rescore=100 2.87 *(to be tested) Encoder + (ctc + 3-gram) + 4-gram lattice rescore + (transformer decoder n-best rescore) num-paths-for-decoder-rescore=500 2.86 *(to be tested) +log_semering=False and remove repeated tokens 2.73 6.11 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#227 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLOYPJOUD2HUMG6JAHTDTXQTTFANCNFSM47Z5W3HQ> .

glynpu · 2021-07-21T10:52:55Z

A better model is obained with following modifications:

	feat-norm	learning-factor	warm-up steps	epoch
before	no	10	40,000	40 epoch (avg=10, with 26-35 epoch)
current	yes	5	80,000(around 10 epochs)	50 epochs (avg=20 with 31-50 epochs)

detail wer on test-clean:

	before	current
Encoder + ctc	3.32	2.98( wer of espnet released model is 2.97/3.00)
Encoder + TLG + 4-gram lattice rescore + nbest rescore with transformer decoder with log_semering=False and remove repeated tokens	2.73	2.54

result with diffrernt combination of decoder_scale and lm_scale
wer=2.54 is obtained with decoder_scale = 1.7 and lm_scale=1.7

decoder_scale(right)lm_scale(below)	0.1	0.3	0.5	0.6	0.7	0.9	1.0	1.1	1.2	1.3	1.5	1.7	1.9	2.0	2.1	2.2	2.3	2.4	2.5
0.1	2.81	2.78	2.75	2.75	2.74	2.74	2.73	2.73	2.73	2.72	2.72	2.71	2.71	2.7	2.69	2.7	2.7	2.7	2.7
0.3	2.75	2.72	2.7	2.69	2.68	2.68	2.69	2.69	2.69	2.68	2.68	2.67	2.67	2.66	2.66	2.67	2.66	2.66	2.66
0.5	2.7	2.66	2.67	2.66	2.67	2.66	2.65	2.66	2.66	2.65	2.64	2.64	2.64	2.63	2.63	2.63	2.63	2.63	2.63
0.6	2.68	2.66	2.64	2.64	2.63	2.65	2.65	2.65	2.65	2.64	2.63	2.63	2.63	2.62	2.61	2.61	2.61	2.62	2.62
0.7	2.67	2.63	2.62	2.63	2.62	2.63	2.64	2.63	2.64	2.64	2.64	2.62	2.61	2.61	2.61	2.61	2.61	2.62	2.61
0.9	2.73	2.61	2.6	2.6	2.61	2.61	2.62	2.61	2.61	2.61	2.61	2.6	2.61	2.61	2.62	2.61	2.61	2.61	2.61
1.0	2.85	2.65	2.59	2.59	2.6	2.6	2.59	2.6	2.59	2.59	2.59	2.61	2.6	2.61	2.61	2.61	2.61	2.61	2.61
1.1	3.04	2.71	2.62	2.59	2.59	2.6	2.6	2.6	2.58	2.59	2.59	2.59	2.59	2.59	2.6	2.6	2.6	2.61	2.6
1.2	3.31	2.86	2.65	2.62	2.59	2.58	2.57	2.58	2.58	2.58	2.58	2.59	2.59	2.59	2.58	2.58	2.59	2.6	2.6
1.3	3.52	3.04	2.75	2.66	2.62	2.57	2.57	2.56	2.56	2.57	2.57	2.58	2.59	2.59	2.59	2.59	2.58	2.58	2.58
1.5	4.0	3.47	3.06	2.89	2.8	2.64	2.6	2.58	2.59	2.56	2.56	2.55	2.56	2.56	2.57	2.58	2.58	2.58	2.59
1.7	4.41	3.87	3.43	3.26	3.07	2.83	2.74	2.67	2.64	2.6	2.58	*2.54*	2.56	2.55	2.55	2.55	2.55	2.57	2.57
1.9	4.64	4.26	3.8	3.61	3.41	3.12	2.99	2.86	2.79	2.73	2.64	2.57	2.56	2.54	2.56	2.56	2.55	2.55	2.55
2.0	4.72	4.38	3.98	3.77	3.59	3.29	3.13	3.01	2.88	2.81	2.68	2.62	2.57	2.56	2.56	2.55	2.56	2.56	2.55

danpovey · 2021-07-21T11:34:27Z

Great!!

pkufool · 2021-07-22T02:56:18Z

egs/librispeech/asr/simple_v1/bpe_ctc_att_conformer_decode.py

+    # fgram means four-gram
+    fgram_rescored_lattices = rescore_with_whole_lattice(lattices, G,
+                                                 lm_scale_list=None,
+                                                 need_rescored_lats=True)


I think we should return here when fgram_rescored_lattices is empty.

if fgram_rescored_lattices.num_arcs == 0: return dict()

I fix the crash when running in batch size equals to one, see k2-fsa/k2#782 .
But it still has some problems when running transformer decoder with an empty input.

Alex-Songs · 2021-07-29T08:59:53Z

Hi glynpu:
This is a very cool work, is there a recipe to reproduce your results?
Thanks!
@glynpu

glynpu · 2021-07-29T13:53:22Z

This is a very cool work, is there a recipe to reproduce your results?

Current pr is mainly about decoding part.
And #219 is about corresponding training part. Follow egs/librispeech/asr/simple_v1/bpe_run.sh in #219 and run stage0 and stage 1 you will reproduce my work. @Alex-Songs

Alex-Songs · 2021-08-03T11:38:26Z

thanks! @glynpu

update bpe models and integrate 4-gram rescore

1f1255c

danpovey reviewed Jul 5, 2021

View reviewed changes

transformer decoder n-best rescore with batch_size=1

d90250b

csukuangfj reviewed Jul 6, 2021

View reviewed changes

csukuangfj suggested changes Jul 6, 2021

View reviewed changes

glynpu added 3 commits July 7, 2021 11:10

fix typo and use log_semiring=False

9ea46ed

use word_seqs to rescore

6d1e935

compare nbest_rescore result between unique_word_seqs and unique_toke…

5c979cc

…n_seqs

log_semiring=False and remove repeat tokens

cfa9f72

support batch decoding

9181524

decode test-other

fd93a50

pkufool mentioned this pull request Jul 14, 2021

Ignore strides CHECK in FromTorch when input tensor is empty k2-fsa/k2#782

Merged

This was referenced Jul 16, 2021

WIP: BPE Training ctc loss and label smooth loss #219

Open

wfst-based decoding espnet/espnet#3176

Closed

use batch_norm as MVN

d51d32a

pkufool reviewed Jul 22, 2021

View reviewed changes

csukuangfj mentioned this pull request Jul 27, 2021

Add CTC training k2-fsa/icefall#3

Merged

		from snowfall.training.mmi_graph import get_phone_symbols


		def nbest_decoding(lats: k2.Fsa, num_paths: int):

	help="Number of checkpionts to average. Automaticly select "
	help="Number of checkpionts to average. Automatically select "

[WIP] update bpe models and integrate 4-gram rescore #227

Are you sure you want to change the base?

[WIP] update bpe models and integrate 4-gram rescore #227

Conversation

glynpu commented Jul 5, 2021 • edited Loading

glynpu commented Jul 5, 2021

csukuangfj commented Jul 5, 2021

danpovey commented Jul 5, 2021

danpovey commented Jul 5, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glynpu Jul 5, 2021 • edited Loading

Choose a reason for hiding this comment

glynpu commented Jul 6, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danpovey commented Jul 6, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glynpu commented Jul 6, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csukuangfj Jul 6, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glynpu Jul 6, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csukuangfj commented Jul 6, 2021

Choose a reason for hiding this comment

glynpu commented Jul 8, 2021

danpovey commented Jul 11, 2021

csukuangfj commented Jul 12, 2021

danpovey commented Jul 12, 2021

pkufool commented Jul 12, 2021

glynpu commented Jul 12, 2021 • edited Loading

glynpu commented Jul 13, 2021

glynpu commented Jul 13, 2021 • edited Loading

danpovey commented Jul 13, 2021 via email

glynpu commented Jul 21, 2021

danpovey commented Jul 21, 2021

Choose a reason for hiding this comment

Alex-Songs commented Jul 29, 2021

glynpu commented Jul 29, 2021

Alex-Songs commented Aug 3, 2021

glynpu commented Jul 5, 2021 •

edited

Loading

glynpu Jul 5, 2021 •

edited

Loading

csukuangfj Jul 6, 2021 •

edited

Loading

glynpu Jul 6, 2021 •

edited

Loading

glynpu commented Jul 12, 2021 •

edited

Loading

glynpu commented Jul 13, 2021 •

edited

Loading