Skip to content
This repository has been archived by the owner on Oct 13, 2022. It is now read-only.

[WIP] update bpe models and integrate 4-gram rescore #227

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

glynpu
Copy link
Contributor

@glynpu glynpu commented Jul 5, 2021

  1. A better model trained by (ctc + label_smooth_loss WIP: BPE Training ctc loss and label smooth loss #219) is released
  2. 4-gram rescore is integrated with refering to WIP: Add BPE training with LF-MMI. #215

Latest result with feat_batch_norm

  Wer% on test_clean wer% on test_other
Encoder + ctc 2.98 (to be tested)
Encoder + (ctc + 3-gram) + 4-gram lattice rescore + (transformer decoder n-best rescore) num-paths-for-decoder-rescore=500 2.54 (to be tested)

Result witout feature_batch_norm

  Wer% on test_clean wer% on test_other
Encoder + ctc 3.32 7.96
Encoder + (ctc + 3-gram) + 4-gram lattice rescore 2.92 *(failed when decoding, working on this)
Encoder + (ctc + 3-gram) + 4-gram lattice rescore + (transformer decoder n-best rescore) num-paths-for-decoder-rescore=100 2.87 *(to be tested)
Encoder + (ctc + 3-gram) + 4-gram lattice rescore + (transformer decoder n-best rescore) num-paths-for-decoder-rescore=500 2.86 *(to be tested)
+log_semering=False and remove repeated tokens 2.73 6.11

Wer result on test_clean:
38af9a2a0505f616a1fb9eaa7817c1a

@glynpu
Copy link
Contributor Author

glynpu commented Jul 5, 2021

Here is the log when program crash while decoding test-other:

INFO:root:batch 1910, cuts processed until now is 1943/2939 (66.110922%)
INFO:root:batch 1920, cuts processed until now is 1953/2939 (66.451174%)
INFO:root:batch 1930, cuts processed until now is 1963/2939 (66.791426%)
[F] /ceph-ly/open-source/latest_k2/k2/k2/python/csrc/torch/torch_util.h:122:k2::Array1<U> k2::FromTorch(at::Tensor&) [with T = in
t] Check failed: tensor.strides()[0] == 1 (4 vs. 1) Expected stride: 1. Given: 4

[ Stack-Trace: ]
/ceph-ly/open-source/latest_k2/k2/build/lib/libk2_log.so(k2::internal::GetStackTrace()+0x5b) [0x7fd36a0f66ba]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x70a52) [0x7fd36b423a52]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0xb8c8f) [0x7fd36b46bc8f]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x1088bd) [0x7fd36b4bb8bd]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x102af4) [0x7fd36b4b5af4]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x11e695) [0x7fd36b4d1695]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x11db07) [0x7fd36b4d0b07]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x116d22) [0x7fd36b4c9d22]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x116f14) [0x7fd36b4c9f14]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x54187) [0x7fd36b407187]
python(PyCFunction_Call+0x56) [0x5ff8a6]
python(_PyObject_MakeTpCall+0x28f) [0x5fff6f]
python(_PyEval_EvalFrameDefault+0x5b9e) [0x57e35e]
python(_PyFunction_Vectorcall+0x19c) [0x602b2c]
python(PyVectorcall_Call+0x51) [0x5ff3b1]
/ceph-ly/py38/lib/python3.8/site-packages/torch/lib/libtorch_python.so(THPFunction_apply(_object*, _object*)+0x8fd) [0x7fd45ebdb7
8d]
python(PyCFunction_Call+0xfb) [0x5ff94b]
python(_PyObject_MakeTpCall+0x28f) [0x5fff6f]
python(_PyEval_EvalFrameDefault+0x5b9e) [0x57e35e]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x442) [0x602dd2]
python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]
python(_PyFunction_Vectorcall+0x19c) [0x602b2c]
python(_PyEval_EvalFrameDefault+0x53f0) [0x57dbb0]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x442) [0x602dd2]
python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]
python(_PyFunction_Vectorcall+0x19c) [0x602b2c]
python(PyVectorcall_Call+0x51) [0x5ff3b1]
python(_PyEval_EvalFrameDefault+0x1c4a) [0x57a40a]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x247) [0x602bd7]
python(_PyEval_EvalFrameDefault+0x619) [0x578dd9]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x442) [0x602dd2]
python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x442) [0x602dd2]
python(PyVectorcall_Call+0x51) [0x5ff3b1]
python(_PyEval_EvalFrameDefault+0x1c4a) [0x57a40a]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x442) [0x602dd2]
python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x247) [0x602bd7]
python(_PyEval_EvalFrameDefault+0x619) [0x578dd9]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python() [0x662c2e]
python(PyRun_FileExFlags+0x97) [0x662d07]
python(PyRun_SimpleFileExFlags+0x17f) [0x663a1f]

Traceback (most recent call last):
  File "bpe_ctc_att_conformer_decode.py", line 617, in <module>
  File "bpe_ctc_att_conformer_decode.py", line 576, in main
    model=model,
  File "/ceph-ly/py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "bpe_ctc_att_conformer_decode.py", line 278, in decode
    model=model,
  File "bpe_ctc_att_conformer_decode.py", line 240, in decode_one_batch
    lm_scale_list)
  File "/ceph-ly/py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/ceph-ly/open-source/to_submit/lattice_rescore_snwofall/snowfall/snowfall/decoding/lm_rescore.py", line 320, in rescore_w
ith_whole_lattice
    best_paths = k2.shortest_path(inv_lats, use_double_scores=True)
  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/fsa_algo.py", line 541, in shortest_path
    out_fsa = k2.utils.fsa_from_unary_function_tensor(fsa, ragged_arc, arc_map)
  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/utils.py", line 449, in fsa_from_unary_function_tensor
    setattr(dest, name, index_select(value, arc_map,
  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/ops.py", line 159, in index_select
    ans = _IndexSelectFunction.apply(src, index, default_value)
  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/ops.py", line 65, in forward
    return _k2.index_select(src, index, default_value)
RuntimeError: Some bad things happed.

@csukuangfj
Copy link
Collaborator

Here is the log when program crash while decoding test-other:


INFO:root:batch 1910, cuts processed until now is 1943/2939 (66.110922%)

INFO:root:batch 1920, cuts processed until now is 1953/2939 (66.451174%)

INFO:root:batch 1930, cuts processed until now is 1963/2939 (66.791426%)

[F] /ceph-ly/open-source/latest_k2/k2/k2/python/csrc/torch/torch_util.h:122:k2::Array1<U> k2::FromTorch(at::Tensor&) [with T = in

t] Check failed: tensor.strides()[0] == 1 (4 vs. 1) Expected stride: 1. Given: 4



[ Stack-Trace: ]

/ceph-ly/open-source/latest_k2/k2/build/lib/libk2_log.so(k2::internal::GetStackTrace()+0x5b) [0x7fd36a0f66ba]

/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x70a52) [0x7fd36b423a52]

/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0xb8c8f) [0x7fd36b46bc8f]

/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x1088bd) [0x7fd36b4bb8bd]

/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x102af4) [0x7fd36b4b5af4]

/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x11e695) [0x7fd36b4d1695]

/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x11db07) [0x7fd36b4d0b07]

/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x116d22) [0x7fd36b4c9d22]

/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x116f14) [0x7fd36b4c9f14]

/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x54187) [0x7fd36b407187]

python(PyCFunction_Call+0x56) [0x5ff8a6]

python(_PyObject_MakeTpCall+0x28f) [0x5fff6f]

python(_PyEval_EvalFrameDefault+0x5b9e) [0x57e35e]

python(_PyFunction_Vectorcall+0x19c) [0x602b2c]

python(PyVectorcall_Call+0x51) [0x5ff3b1]

/ceph-ly/py38/lib/python3.8/site-packages/torch/lib/libtorch_python.so(THPFunction_apply(_object*, _object*)+0x8fd) [0x7fd45ebdb7

8d]

python(PyCFunction_Call+0xfb) [0x5ff94b]

python(_PyObject_MakeTpCall+0x28f) [0x5fff6f]

python(_PyEval_EvalFrameDefault+0x5b9e) [0x57e35e]

python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]

python(_PyFunction_Vectorcall+0x442) [0x602dd2]

python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]

python(_PyFunction_Vectorcall+0x19c) [0x602b2c]

python(_PyEval_EvalFrameDefault+0x53f0) [0x57dbb0]

python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]

python(_PyFunction_Vectorcall+0x442) [0x602dd2]

python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]

python(_PyFunction_Vectorcall+0x19c) [0x602b2c]

python(PyVectorcall_Call+0x51) [0x5ff3b1]

python(_PyEval_EvalFrameDefault+0x1c4a) [0x57a40a]

python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]

python(_PyFunction_Vectorcall+0x247) [0x602bd7]

python(_PyEval_EvalFrameDefault+0x619) [0x578dd9]

python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]

python(_PyFunction_Vectorcall+0x442) [0x602dd2]

python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]

python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]

python(_PyFunction_Vectorcall+0x442) [0x602dd2]

python(PyVectorcall_Call+0x51) [0x5ff3b1]

python(_PyEval_EvalFrameDefault+0x1c4a) [0x57a40a]

python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]

python(_PyFunction_Vectorcall+0x442) [0x602dd2]

python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]

python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]

python(_PyFunction_Vectorcall+0x247) [0x602bd7]

python(_PyEval_EvalFrameDefault+0x619) [0x578dd9]

python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]

python() [0x662c2e]

python(PyRun_FileExFlags+0x97) [0x662d07]

python(PyRun_SimpleFileExFlags+0x17f) [0x663a1f]



Traceback (most recent call last):

  File "bpe_ctc_att_conformer_decode.py", line 617, in <module>

  File "bpe_ctc_att_conformer_decode.py", line 576, in main

    model=model,

  File "/ceph-ly/py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context

    return func(*args, **kwargs)

  File "bpe_ctc_att_conformer_decode.py", line 278, in decode

    model=model,

  File "bpe_ctc_att_conformer_decode.py", line 240, in decode_one_batch

    lm_scale_list)

  File "/ceph-ly/py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context

    return func(*args, **kwargs)

  File "/ceph-ly/open-source/to_submit/lattice_rescore_snwofall/snowfall/snowfall/decoding/lm_rescore.py", line 320, in rescore_w

ith_whole_lattice

    best_paths = k2.shortest_path(inv_lats, use_double_scores=True)

  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/fsa_algo.py", line 541, in shortest_path

    out_fsa = k2.utils.fsa_from_unary_function_tensor(fsa, ragged_arc, arc_map)

  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/utils.py", line 449, in fsa_from_unary_function_tensor

    setattr(dest, name, index_select(value, arc_map,

  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/ops.py", line 159, in index_select

    ans = _IndexSelectFunction.apply(src, index, default_value)

  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/ops.py", line 65, in forward

    return _k2.index_select(src, index, default_value)

RuntimeError: Some bad things happed.



Will have a look. Probably tomorrow.

@danpovey
Copy link
Contributor

danpovey commented Jul 5, 2021

Can you find the code where it gets 'index' from? Possibly we failed to do clone() at some point to make it a stride-1 tensor if it came from an FSA (but it's still very odd). You may be able to replicate the failure in pdb and debug it that way (let me know by wechat if when run in pdb shows an error, because I may be able to remember the fix).

@danpovey
Copy link
Contributor

danpovey commented Jul 5, 2021

The line numbers in utils.py don't seem to match with the current master.

from snowfall.training.mmi_graph import get_phone_symbols


def nbest_decoding(lats: k2.Fsa, num_paths: int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this nbest_decoding function is still here as some kind of demo? It didn't help vs. just one-best, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to combine this nbest_decoding with transformer-decoder nbest-rescore.
Now only encoder model is used, and transformer-decoder model may be used as a rescore "Language model".
I am still working on this.

if [ $stage -le 2 ]; then
dir=data/lang_bpe2
mkdir -p $dir
token_file=./data/en_token_list/bpe_unigram5000/tokens.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this come from (data/en_token_list/bpe_unigram5000/tokens.txt)?

Copy link
Contributor Author

@glynpu glynpu Jul 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently they are downloaded from the snowfall_model_zoo together with neural net models.
Originally, They are trained by sentencepiece tokenizer which is also used by Espnet and #215.
To make it easier to be reviewed, this pr is mainly about decoding part.
Tokenizer training part will be summited with the model training part #219.

@glynpu
Copy link
Contributor Author

glynpu commented Jul 6, 2021

Result of n-best rescore with transformer decoder:

  Wer% on test_clean wer% on test_other
Encoder + ctc 3.32 7.96
Encoder + (ctc + 3-gram) + 4-gram lattice rescore 2.92 *(failed when decoding, working on this)
Encoder + (ctc + 3-gram) + 4-gram lattice rescore + (transformer decoder n-best rescore) num-paths-for-decoder-rescore=100 2.87 *(to be tested)
Encoder + (ctc + 3-gram) + 4-gram lattice rescore + (transformer decoder n-best rescore) num-paths-for-decoder-rescore=500 2.86 *(to be tested)

Detail errors

num-paths-for-decoder-rescore=100
INFO:root:[test-clean-lm_scale_0.6] %WER 2.87% [1510 / 52576, 207 ins, 130 del, 1173 sub ]
num-paths-for-decoder-rescore=500
INFO:root:[test-clean-lm_scale_0.6] %WER 2.86% [1505 / 52576, 207 ins, 128 del, 1170 sub ]

paths = k2.random_paths(lats, num_paths=num_paths, use_double_scores=True)

# token_seqs/word_seqs is a k2.RaggedInt sharing the same shape as `paths`
# but it contains word IDs. Note that it also contains 0s and -1s.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does only word_seqs contain word IDs?
I feel the whole sentence applies to both token_seqs and word_seqs since
you're using token_seqs/word_seqs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does only word_seqs contain word IDs?

yes.

I feel the whole sentence applies to both token_seqs and word_seqs since
you're using token_seqs/word_seqs.

you are right.
Sorry for the confusing statement.
token_seqs/word_seqs means (token_seqs or word_seqs)

N-best rescore with transformer-decoder model.
The basic idea is to first extra n-best paths from the given lattice.
Then extract word_seqs and token_seqs for each path.
Compute the negative log-likehood for each token_seq as 'language model score', called decoder_scores.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a typo here? Why is the log-likelihood negative?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually NOT a typo. It's computed by torch.nn.functional.cross_entropy whose result is negative log-likelihood.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the comment, decoder_scores is the negative log-likehood for each token_seq.
Can we remove negative?

@danpovey
Copy link
Contributor

danpovey commented Jul 6, 2021

What is the LM scale? I would imagine that when using the transformer decoder, we'd need to scale down the LM probabilities, because that decoder would already account for the LM prob.

fgram_lm_lats = k2.top_sort(k2.connect(fgram_lm_lats.to('cpu')).to(lats.device))
# am_scores is computed with log_semiring=True
# set log_semiring=True here to make fgram_lm_scores comparable to am_scores
fgram_tot_scores = fgram_lm_lats.get_tot_scores(use_double_scores=True, log_semiring=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #214

The 2nd arg to get_tot_scores() here, representing log_semiring, should be false, because ARPA-type language models are constructed in such a way that the backoff prob is included in the direct arc. I.e. we would be double-counting if we were to sum the probabilities of the non-backoff and backoff arcs.

Have you tried to use log_semiring=False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, will try it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log_semiring=False is a little better than log_semiring=True (3.84% vs. 3.86%) with num_paths=500.

INFO:root:[test-clean-lm_scale_0.6] %WER 2.84% [1491 / 52576, 203 ins, 135 del, 1153 sub ]

- fgram_tot_scores = fgram_lm_lats.get_tot_scores(use_double_scores=True, log_semiring=True)
+ fgram_tot_scores = fgram_lm_lats.get_tot_scores(use_double_scores=True, log_semiring=False)

nll = model.decoder_nll(encoder_memory, memory_mask, token_ids=token_ids)
assert nll.shape[0] == num_seqs
decoder_scores = - nll.sum(dim=1)
tot_scores = am_scores + fgram_lm_scores + decoder_scores
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try different weights for the three components of tot_scores?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a recommanded range of these there weights?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest trying different combinations?
For instance,

am_scale = 0.5
ngram_lm_scale = 0.3
nn_lm_scale = 1 - am_scale - ngram_lm_scale

tot_scores = am_scale * am_scores + ngram_lm_scale * fgram_lm_scores + nn_lm_scale * decoder_scores

You may need to tune the scales for different kinds of scores.

@glynpu
Copy link
Contributor Author

glynpu commented Jul 6, 2021

What is the LM scale?

currently no scale. as:

    tot_scores = am_scores + fgram_lm_scores + decoder_scores

we'd need to scale down the LM probabilities, because that decoder would already account for the LM prob.

Do you mean assign a weight less than one to lm_scores? like this:

-   tot_scores = am_scores + fgram_lm_scores + decoder_scores
+   lm_score_weight = 0.6 # just a value less than one
+   decoder_score_weight = 0.7 # just a value less than one
+   tot_scores = am_scores + lm_score_weight * fgram_lm_scores + decoder_score_weight * decoder_scores


lats = k2.arc_sort(lats)
fgram_lm_lats = _intersect_device(lats, token_fsas_with_epsilon_loops, path_to_seq_map, sorted_match_a=True)
fgram_lm_lats = k2.top_sort(k2.connect(fgram_lm_lats.to('cpu')).to(lats.device))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update to the latest k2, which supports running k2.connect on CUDA.
You can use

fgram_lm_lats = k2.top_sort(k2.connect(fgram_lm_lats))

num_seqs = len(token_ids)
time_steps = encoder_memory.shape[0]
feature_dim = encoder_memory.shape[2]
encoder_memory = encoder_memory.expand(time_steps, num_seqs, feature_dim)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are this line and the following line can be removed? I think they are redundant
and are equivalent to a no-op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NO.
before expand:
encoder_memory.shape = (time_steps, 1, feature_dim)
asfter expand:
encoder_memroy.shape = (time_steps, num_seqs, feature_dim)

(BTW, that's why my implementation only support batch_size=1, as I am figuring out a way to handle this encoder_memory)

decoder_scores = - nll.sum(dim=1)
tot_scores = am_scores + fgram_lm_scores + decoder_scores
best_seq_idx = new2old[torch.argmax(tot_scores)]
best_word_seq = [k2.ragged.to_list(word_seqs)[0][best_seq_idx]]
Copy link
Collaborator

@csukuangfj csukuangfj Jul 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work when there are more than 1 sequences, i.e., when batch_size > 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not work because I am figuring out a way to handle encoder_memory.

# `new2old` is a 1-D torch.Tensor mapping from the output path index
# to the input path index.
# new2old.numel() == unique_word_seqs.num_elements()
unique_token_seqs, _, new2old = k2.ragged.unique_sequences(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use the approach we are using in the current master?

That is, use unique_word_seqs, not unique_token_seqs, to compute the lm_scores.

Different token seqs in unique_tokens_seqs may correspond to the same word seqs.
lm_scores is for word seqs, not token seqs.

Copy link
Contributor Author

@glynpu glynpu Jul 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will try it.
Now I use unique_token_seqs rather than unique_word_seqs because of following two reasons:

  1. token_seq is always a 1-to-1 map to word_seq. These should not be many disambiguations.
  2. transformer decoder is trained by token_seq. unique_token_seqs is already generated for transformer decoder, so I use it to get lm_scores.

Actually when you want to get word_seq from token_seq, just do:

word_seq = ''.join(token_seq).replace('_',' ')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

token_seq is always a 1-to-1 map to word_seq. These should not be many disambiguations.

Are there epsilons (0s) in token seqs? Are there contiguous repeated tokens in token seqs?
token seqs from the above two cases can correspond to the same word seq, I think.


transformer decoder is trained by token_seq. unique_token_seqs is already generated for transformer decoder, so I use it to get lm_scores.

Is it possible to get the token seq from a word seq given the word piece model?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fangjun is right that we should the unique_word_seqs, because even though it's a 1-1 map, that won't be obvious to k2.ragged.unique_sequences; many of them will really be repeats. When composing the LM with the CTC topo, we need to keep the "inner_labels" as an attribute, I believe compose() has an arg "inner_labels_name" or something like that that so the inner (matched) labels can be kept.

@csukuangfj
Copy link
Collaborator

Do you mean assign a weight less than one to lm_scores? like this:

  • tot_scores = am_scores + fgram_lm_scores + decoder_scores
  • lm_score_weight = 0.6 # just a value less than one
  • decoder_score_weight = 0.7 # just a value less than one
  • tot_scores = am_scores + lm_score_weight * fgram_lm_scores + decoder_score_weight * decoder_scores

I often see people using a combination of weights, whose sum is 1.

'--avg',
type=int,
default=10,
help="Number of checkpionts to average. Automaticly select "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help="Number of checkpionts to average. Automaticly select "
help="Number of checkpionts to average. Automatically select "

@glynpu
Copy link
Contributor Author

glynpu commented Jul 8, 2021

compute am/4-gram lm_scores with unique_token_seqs seems a little better than that of unique_word_seqs, with a variety of combination of lm_scale and decoder_scale.

  Wer% on test_clean wer% on test_other
Encoder + ctc 3.32 7.96
Encoder + (ctc + 3-gram) + 4-gram lattice rescore 2.92 *(failed when decoding, working on this)
+transformer decoder n-best rescore computing with unique_word_seqs 2.87 *(to be tested)
+transformer decoder n-best rescore computing with unique_token_seqs 2.81 *(to be tested)

wer of test_clean with compute_am_flm_scores_1, computing with unique_word_seqs.

decoder_scale(right)lm_scale(below) 0.01 0.03 0.05 0.08 0.09 0.1 0.3 0.5 0.7 0.9 1.0 2.0 4.0 6.0 8.0 10.0
0.1 3.04 3.02 3.02 2.99 3.0 2.99 3.07 3.17 3.25 3.33 3.33 3.52 3.6 3.66 3.69 3.71
0.3 2.96 2.94 2.94 2.94 2.93 2.94 3.05 3.17 3.24 3.31 3.34 3.52 3.61 3.66 3.69 3.72
0.5 2.93 2.91 2.89 2.88 2.87 2.89 3.04 3.14 3.27 3.33 3.36 3.52 3.62 3.67 3.7 3.71
0.6 2.91 2.89 2.88 2.89 2.88 2.89 3.04 3.16 3.26 3.34 3.37 3.53 3.62 3.67 3.7 3.72
0.7 2.93 2.93 2.91 2.91 2.9 2.9 3.06 3.16 3.28 3.33 3.36 3.53 3.62 3.67 3.71 3.73
0.9 3.14 3.1 3.09 3.06 3.05 3.05 3.13 3.24 3.31 3.37 3.4 3.55 3.65 3.69 3.72 3.74
1.0 3.33 3.28 3.25 3.2 3.21 3.21 3.21 3.29 3.37 3.4 3.43 3.59 3.67 3.7 3.74 3.74
2.0 5.63 5.56 5.53 5.47 5.45 5.43 5.06 4.68 4.39 4.18 4.11 3.82 3.8 3.8 3.81 3.8
4.0 6.13 6.11 6.1 6.1 6.09 6.08 5.97 5.84 5.69 5.56 5.49 4.75 4.06 3.92 3.87 3.86
6.0 6.23 6.22 6.22 6.2 6.21 6.21 6.15 6.08 5.99 5.91 5.89 5.44 4.65 4.14 3.96 3.92
8.0 6.3 6.3 6.28 6.28 6.28 6.27 6.23 6.19 6.13 6.09 6.04 5.79 5.08 4.61 4.22 4.02
10.0 6.32 6.32 6.31 6.31 6.31 6.31 6.27 6.24 6.2 6.16 6.14 5.93 5.42 4.9 4.58 4.27

wer of test_clean with compute_am_flm_scores_2,computing with unique_token_seqs.

decoder_scale(right)lm_scale(below) 0.01 0.03 0.05 0.08 0.09 0.1 0.3 0.5 0.7 0.9 1.0 2.0 4.0 6.0 8.0 10.0
0.1 3.02 3.0 2.98 2.95 2.94 2.94 2.9 2.87 2.88 2.89 2.88 2.89 2.91 2.92 2.93 2.94
0.3 2.97 2.95 2.93 2.91 2.9 2.9 2.85 2.86 2.86 2.85 2.86 2.89 2.9 2.93 2.93 2.94
0.5 2.92 2.92 2.91 2.88 2.88 2.88 2.85 2.82 2.83 2.85 2.85 2.88 2.91 2.93 2.94 2.94
0.6 2.92 2.89 2.9 2.88 2.86 2.86 2.84 2.83 2.83 2.84 2.85 2.88 2.92 2.93 2.94 2.94
0.7 2.94 2.93 2.93 2.9 2.9 2.89 2.82 2.82 2.83 2.84 2.84 2.89 2.92 2.93 2.94 2.94
0.9 3.14 3.11 3.07 3.01 3.0 2.99 2.88 2.82 2.81 2.82 2.84 2.89 2.93 2.94 2.94 2.94
1.0 3.3 3.25 3.19 3.14 3.12 3.11 2.91 2.85 2.83 2.82 2.83 2.89 2.93 2.94 2.94 2.95
2.0 5.53 5.48 5.45 5.38 5.35 5.33 4.7 4.11 3.72 3.5 3.39 2.97 2.93 2.94 2.94 2.95
4.0 6.09 6.08 6.06 6.05 6.05 6.04 5.86 5.6 5.29 4.98 4.85 3.95 3.14 2.98 2.94 2.95
6.0 6.19 6.19 6.19 6.17 6.18 6.17 6.08 5.94 5.79 5.61 5.5 4.67 3.76 3.25 3.02 2.98
8.0 6.25 6.25 6.25 6.25 6.24 6.23 6.16 6.08 5.97 5.87 5.81 5.18 4.18 3.67 3.3 3.09
10.0 6.28 6.28 6.27 6.27 6.27 6.26 6.21 6.15 6.08 5.99 5.94 5.48 4.54 3.99 3.63 3.36

log of compute_am_flm_scores_1:

lm_scale_0.5_decoder_scale_0.09	2.87	best for test-clean
lm_scale_0.5_decoder_scale_0.08	2.88
lm_scale_0.6_decoder_scale_0.05	2.88
lm_scale_0.6_decoder_scale_0.09	2.88
lm_scale_0.5_decoder_scale_0.1	2.89
lm_scale_0.5_decoder_scale_0.05	2.89
lm_scale_0.6_decoder_scale_0.1	2.89
lm_scale_0.6_decoder_scale_0.03	2.89
lm_scale_0.6_decoder_scale_0.08	2.89
lm_scale_0.7_decoder_scale_0.1	2.9
lm_scale_0.7_decoder_scale_0.09	2.9
lm_scale_0.5_decoder_scale_0.03	2.91
lm_scale_0.6_decoder_scale_0.01	2.91
lm_scale_0.7_decoder_scale_0.05	2.91
lm_scale_0.7_decoder_scale_0.08	2.91
lm_scale_0.3_decoder_scale_0.09	2.93
lm_scale_0.5_decoder_scale_0.01	2.93
lm_scale_0.7_decoder_scale_0.01	2.93
lm_scale_0.7_decoder_scale_0.03	2.93
lm_scale_0.3_decoder_scale_0.1	2.94
lm_scale_0.3_decoder_scale_0.03	2.94
lm_scale_0.3_decoder_scale_0.05	2.94
lm_scale_0.3_decoder_scale_0.08	2.94
lm_scale_0.3_decoder_scale_0.01	2.96
lm_scale_0.1_decoder_scale_0.1	2.99
lm_scale_0.1_decoder_scale_0.08	2.99
lm_scale_0.1_decoder_scale_0.09	3.0
lm_scale_0.1_decoder_scale_0.03	3.02
lm_scale_0.1_decoder_scale_0.05	3.02
lm_scale_0.1_decoder_scale_0.01	3.04
lm_scale_0.5_decoder_scale_0.3	3.04
lm_scale_0.6_decoder_scale_0.3	3.04
lm_scale_0.3_decoder_scale_0.3	3.05
lm_scale_0.9_decoder_scale_0.1	3.05
lm_scale_0.9_decoder_scale_0.09	3.05
lm_scale_0.7_decoder_scale_0.3	3.06
lm_scale_0.9_decoder_scale_0.08	3.06
lm_scale_0.1_decoder_scale_0.3	3.07
lm_scale_0.9_decoder_scale_0.05	3.09
lm_scale_0.9_decoder_scale_0.03	3.1
lm_scale_0.9_decoder_scale_0.3	3.13
lm_scale_0.5_decoder_scale_0.5	3.14
lm_scale_0.9_decoder_scale_0.01	3.14
lm_scale_0.6_decoder_scale_0.5	3.16
lm_scale_0.7_decoder_scale_0.5	3.16
lm_scale_0.1_decoder_scale_0.5	3.17
lm_scale_0.3_decoder_scale_0.5	3.17
lm_scale_1.0_decoder_scale_0.08	3.2
lm_scale_1.0_decoder_scale_0.1	3.21
lm_scale_1.0_decoder_scale_0.3	3.21
lm_scale_1.0_decoder_scale_0.09	3.21
lm_scale_0.3_decoder_scale_0.7	3.24
lm_scale_0.9_decoder_scale_0.5	3.24
lm_scale_0.1_decoder_scale_0.7	3.25
lm_scale_1.0_decoder_scale_0.05	3.25
lm_scale_0.6_decoder_scale_0.7	3.26
lm_scale_0.5_decoder_scale_0.7	3.27
lm_scale_0.7_decoder_scale_0.7	3.28
lm_scale_1.0_decoder_scale_0.03	3.28
lm_scale_1.0_decoder_scale_0.5	3.29
lm_scale_0.3_decoder_scale_0.9	3.31
lm_scale_0.9_decoder_scale_0.7	3.31
lm_scale_0.1_decoder_scale_0.9	3.33
lm_scale_0.1_decoder_scale_1.0	3.33
lm_scale_0.5_decoder_scale_0.9	3.33
lm_scale_0.7_decoder_scale_0.9	3.33
lm_scale_1.0_decoder_scale_0.01	3.33
lm_scale_0.3_decoder_scale_1.0	3.34
lm_scale_0.6_decoder_scale_0.9	3.34
lm_scale_0.5_decoder_scale_1.0	3.36
lm_scale_0.7_decoder_scale_1.0	3.36
lm_scale_0.6_decoder_scale_1.0	3.37
lm_scale_0.9_decoder_scale_0.9	3.37
lm_scale_1.0_decoder_scale_0.7	3.37
lm_scale_0.9_decoder_scale_1.0	3.4
lm_scale_1.0_decoder_scale_0.9	3.4
lm_scale_1.0_decoder_scale_1.0	3.43
lm_scale_0.1_decoder_scale_2.0	3.52
lm_scale_0.3_decoder_scale_2.0	3.52
lm_scale_0.5_decoder_scale_2.0	3.52
lm_scale_0.6_decoder_scale_2.0	3.53
lm_scale_0.7_decoder_scale_2.0	3.53
lm_scale_0.9_decoder_scale_2.0	3.55
lm_scale_1.0_decoder_scale_2.0	3.59
lm_scale_0.1_decoder_scale_4.0	3.6
lm_scale_0.3_decoder_scale_4.0	3.61
lm_scale_0.5_decoder_scale_4.0	3.62
lm_scale_0.6_decoder_scale_4.0	3.62
lm_scale_0.7_decoder_scale_4.0	3.62
lm_scale_0.9_decoder_scale_4.0	3.65
lm_scale_0.1_decoder_scale_6.0	3.66
lm_scale_0.3_decoder_scale_6.0	3.66
lm_scale_0.5_decoder_scale_6.0	3.67
lm_scale_0.6_decoder_scale_6.0	3.67
lm_scale_0.7_decoder_scale_6.0	3.67
lm_scale_1.0_decoder_scale_4.0	3.67
lm_scale_0.1_decoder_scale_8.0	3.69
lm_scale_0.3_decoder_scale_8.0	3.69
lm_scale_0.9_decoder_scale_6.0	3.69
lm_scale_0.5_decoder_scale_8.0	3.7
lm_scale_0.6_decoder_scale_8.0	3.7
lm_scale_1.0_decoder_scale_6.0	3.7
lm_scale_0.1_decoder_scale_10.0	3.71
lm_scale_0.5_decoder_scale_10.0	3.71
lm_scale_0.7_decoder_scale_8.0	3.71
lm_scale_0.3_decoder_scale_10.0	3.72
lm_scale_0.6_decoder_scale_10.0	3.72
lm_scale_0.9_decoder_scale_8.0	3.72
lm_scale_0.7_decoder_scale_10.0	3.73
lm_scale_0.9_decoder_scale_10.0	3.74
lm_scale_1.0_decoder_scale_8.0	3.74
lm_scale_1.0_decoder_scale_10.0	3.74
lm_scale_2.0_decoder_scale_4.0	3.8
lm_scale_2.0_decoder_scale_6.0	3.8
lm_scale_2.0_decoder_scale_10.0	3.8
lm_scale_2.0_decoder_scale_8.0	3.81
lm_scale_2.0_decoder_scale_2.0	3.82
lm_scale_4.0_decoder_scale_10.0	3.86
lm_scale_4.0_decoder_scale_8.0	3.87
lm_scale_4.0_decoder_scale_6.0	3.92
lm_scale_6.0_decoder_scale_10.0	3.92
lm_scale_6.0_decoder_scale_8.0	3.96
lm_scale_8.0_decoder_scale_10.0	4.02
lm_scale_4.0_decoder_scale_4.0	4.06
lm_scale_2.0_decoder_scale_1.0	4.11
lm_scale_6.0_decoder_scale_6.0	4.14
lm_scale_2.0_decoder_scale_0.9	4.18
lm_scale_8.0_decoder_scale_8.0	4.22
lm_scale_10.0_decoder_scale_10.0	4.27
lm_scale_2.0_decoder_scale_0.7	4.39
lm_scale_10.0_decoder_scale_8.0	4.58
lm_scale_8.0_decoder_scale_6.0	4.61
lm_scale_6.0_decoder_scale_4.0	4.65
lm_scale_2.0_decoder_scale_0.5	4.68
lm_scale_4.0_decoder_scale_2.0	4.75
lm_scale_10.0_decoder_scale_6.0	4.9
lm_scale_2.0_decoder_scale_0.3	5.06
lm_scale_8.0_decoder_scale_4.0	5.08
lm_scale_10.0_decoder_scale_4.0	5.42
lm_scale_2.0_decoder_scale_0.1	5.43
lm_scale_6.0_decoder_scale_2.0	5.44
lm_scale_2.0_decoder_scale_0.09	5.45
lm_scale_2.0_decoder_scale_0.08	5.47
lm_scale_4.0_decoder_scale_1.0	5.49
lm_scale_2.0_decoder_scale_0.05	5.53
lm_scale_2.0_decoder_scale_0.03	5.56
lm_scale_4.0_decoder_scale_0.9	5.56
lm_scale_2.0_decoder_scale_0.01	5.63
lm_scale_4.0_decoder_scale_0.7	5.69
lm_scale_8.0_decoder_scale_2.0	5.79
lm_scale_4.0_decoder_scale_0.5	5.84
lm_scale_6.0_decoder_scale_1.0	5.89
lm_scale_6.0_decoder_scale_0.9	5.91
lm_scale_10.0_decoder_scale_2.0	5.93
lm_scale_4.0_decoder_scale_0.3	5.97
lm_scale_6.0_decoder_scale_0.7	5.99
lm_scale_8.0_decoder_scale_1.0	6.04
lm_scale_4.0_decoder_scale_0.1	6.08
lm_scale_6.0_decoder_scale_0.5	6.08
lm_scale_4.0_decoder_scale_0.09	6.09
lm_scale_8.0_decoder_scale_0.9	6.09
lm_scale_4.0_decoder_scale_0.05	6.1
lm_scale_4.0_decoder_scale_0.08	6.1
lm_scale_4.0_decoder_scale_0.03	6.11
lm_scale_4.0_decoder_scale_0.01	6.13
lm_scale_8.0_decoder_scale_0.7	6.13
lm_scale_10.0_decoder_scale_1.0	6.14
lm_scale_6.0_decoder_scale_0.3	6.15
lm_scale_10.0_decoder_scale_0.9	6.16
lm_scale_8.0_decoder_scale_0.5	6.19
lm_scale_6.0_decoder_scale_0.08	6.2
lm_scale_10.0_decoder_scale_0.7	6.2
lm_scale_6.0_decoder_scale_0.1	6.21
lm_scale_6.0_decoder_scale_0.09	6.21
lm_scale_6.0_decoder_scale_0.03	6.22
lm_scale_6.0_decoder_scale_0.05	6.22
lm_scale_6.0_decoder_scale_0.01	6.23
lm_scale_8.0_decoder_scale_0.3	6.23
lm_scale_10.0_decoder_scale_0.5	6.24
lm_scale_8.0_decoder_scale_0.1	6.27
lm_scale_10.0_decoder_scale_0.3	6.27
lm_scale_8.0_decoder_scale_0.05	6.28
lm_scale_8.0_decoder_scale_0.08	6.28
lm_scale_8.0_decoder_scale_0.09	6.28
lm_scale_8.0_decoder_scale_0.01	6.3
lm_scale_8.0_decoder_scale_0.03	6.3
lm_scale_10.0_decoder_scale_0.1	6.31
lm_scale_10.0_decoder_scale_0.05	6.31
lm_scale_10.0_decoder_scale_0.08	6.31
lm_scale_10.0_decoder_scale_0.09	6.31
lm_scale_10.0_decoder_scale_0.01	6.32
lm_scale_10.0_decoder_scale_0.03	6.32

log of compute_am_flm_scores_2

lm_scale_0.9_decoder_scale_0.7	2.81	best for test-clean
lm_scale_0.5_decoder_scale_0.5	2.82
lm_scale_0.7_decoder_scale_0.3	2.82
lm_scale_0.7_decoder_scale_0.5	2.82
lm_scale_0.9_decoder_scale_0.5	2.82
lm_scale_0.9_decoder_scale_0.9	2.82
lm_scale_1.0_decoder_scale_0.9	2.82
lm_scale_0.5_decoder_scale_0.7	2.83
lm_scale_0.6_decoder_scale_0.5	2.83
lm_scale_0.6_decoder_scale_0.7	2.83
lm_scale_0.7_decoder_scale_0.7	2.83
lm_scale_1.0_decoder_scale_0.7	2.83
lm_scale_1.0_decoder_scale_1.0	2.83
lm_scale_0.6_decoder_scale_0.3	2.84
lm_scale_0.6_decoder_scale_0.9	2.84
lm_scale_0.7_decoder_scale_0.9	2.84
lm_scale_0.7_decoder_scale_1.0	2.84
lm_scale_0.9_decoder_scale_1.0	2.84
lm_scale_0.3_decoder_scale_0.3	2.85
lm_scale_0.3_decoder_scale_0.9	2.85
lm_scale_0.5_decoder_scale_0.3	2.85
lm_scale_0.5_decoder_scale_0.9	2.85
lm_scale_0.5_decoder_scale_1.0	2.85
lm_scale_0.6_decoder_scale_1.0	2.85
lm_scale_1.0_decoder_scale_0.5	2.85
lm_scale_0.3_decoder_scale_0.5	2.86
lm_scale_0.3_decoder_scale_0.7	2.86
lm_scale_0.3_decoder_scale_1.0	2.86
lm_scale_0.6_decoder_scale_0.1	2.86
lm_scale_0.6_decoder_scale_0.09	2.86
lm_scale_0.1_decoder_scale_0.5	2.87
lm_scale_0.1_decoder_scale_0.7	2.88
lm_scale_0.1_decoder_scale_1.0	2.88
lm_scale_0.5_decoder_scale_0.1	2.88
lm_scale_0.5_decoder_scale_2.0	2.88
lm_scale_0.5_decoder_scale_0.08	2.88
lm_scale_0.5_decoder_scale_0.09	2.88
lm_scale_0.6_decoder_scale_2.0	2.88
lm_scale_0.6_decoder_scale_0.08	2.88
lm_scale_0.9_decoder_scale_0.3	2.88
lm_scale_0.1_decoder_scale_0.9	2.89
lm_scale_0.1_decoder_scale_2.0	2.89
lm_scale_0.3_decoder_scale_2.0	2.89
lm_scale_0.6_decoder_scale_0.03	2.89
lm_scale_0.7_decoder_scale_0.1	2.89
lm_scale_0.7_decoder_scale_2.0	2.89
lm_scale_0.9_decoder_scale_2.0	2.89
lm_scale_1.0_decoder_scale_2.0	2.89
lm_scale_0.1_decoder_scale_0.3	2.9
lm_scale_0.3_decoder_scale_0.1	2.9
lm_scale_0.3_decoder_scale_4.0	2.9
lm_scale_0.3_decoder_scale_0.09	2.9
lm_scale_0.6_decoder_scale_0.05	2.9
lm_scale_0.7_decoder_scale_0.08	2.9
lm_scale_0.7_decoder_scale_0.09	2.9
lm_scale_0.1_decoder_scale_4.0	2.91
lm_scale_0.3_decoder_scale_0.08	2.91
lm_scale_0.5_decoder_scale_4.0	2.91
lm_scale_0.5_decoder_scale_0.05	2.91
lm_scale_1.0_decoder_scale_0.3	2.91
lm_scale_0.1_decoder_scale_6.0	2.92
lm_scale_0.5_decoder_scale_0.01	2.92
lm_scale_0.5_decoder_scale_0.03	2.92
lm_scale_0.6_decoder_scale_4.0	2.92
lm_scale_0.6_decoder_scale_0.01	2.92
lm_scale_0.7_decoder_scale_4.0	2.92
lm_scale_0.1_decoder_scale_8.0	2.93
lm_scale_0.3_decoder_scale_6.0	2.93
lm_scale_0.3_decoder_scale_8.0	2.93
lm_scale_0.3_decoder_scale_0.05	2.93
lm_scale_0.5_decoder_scale_6.0	2.93
lm_scale_0.6_decoder_scale_6.0	2.93
lm_scale_0.7_decoder_scale_6.0	2.93
lm_scale_0.7_decoder_scale_0.03	2.93
lm_scale_0.7_decoder_scale_0.05	2.93
lm_scale_0.9_decoder_scale_4.0	2.93
lm_scale_1.0_decoder_scale_4.0	2.93
lm_scale_2.0_decoder_scale_4.0	2.93
lm_scale_0.1_decoder_scale_0.1	2.94
lm_scale_0.1_decoder_scale_10.0	2.94
lm_scale_0.1_decoder_scale_0.09	2.94
lm_scale_0.3_decoder_scale_10.0	2.94
lm_scale_0.5_decoder_scale_8.0	2.94
lm_scale_0.5_decoder_scale_10.0	2.94
lm_scale_0.6_decoder_scale_8.0	2.94
lm_scale_0.6_decoder_scale_10.0	2.94
lm_scale_0.7_decoder_scale_8.0	2.94
lm_scale_0.7_decoder_scale_10.0	2.94
lm_scale_0.7_decoder_scale_0.01	2.94
lm_scale_0.9_decoder_scale_6.0	2.94
lm_scale_0.9_decoder_scale_8.0	2.94
lm_scale_0.9_decoder_scale_10.0	2.94
lm_scale_1.0_decoder_scale_6.0	2.94
lm_scale_1.0_decoder_scale_8.0	2.94
lm_scale_2.0_decoder_scale_6.0	2.94
lm_scale_2.0_decoder_scale_8.0	2.94
lm_scale_4.0_decoder_scale_8.0	2.94
lm_scale_0.1_decoder_scale_0.08	2.95
lm_scale_0.3_decoder_scale_0.03	2.95
lm_scale_1.0_decoder_scale_10.0	2.95
lm_scale_2.0_decoder_scale_10.0	2.95
lm_scale_4.0_decoder_scale_10.0	2.95
lm_scale_0.3_decoder_scale_0.01	2.97
lm_scale_2.0_decoder_scale_2.0	2.97
lm_scale_0.1_decoder_scale_0.05	2.98
lm_scale_4.0_decoder_scale_6.0	2.98
lm_scale_6.0_decoder_scale_10.0	2.98
lm_scale_0.9_decoder_scale_0.1	2.99
lm_scale_0.1_decoder_scale_0.03	3.0
lm_scale_0.9_decoder_scale_0.09	3.0
lm_scale_0.9_decoder_scale_0.08	3.01
lm_scale_0.1_decoder_scale_0.01	3.02
lm_scale_6.0_decoder_scale_8.0	3.02
lm_scale_0.9_decoder_scale_0.05	3.07
lm_scale_8.0_decoder_scale_10.0	3.09
lm_scale_0.9_decoder_scale_0.03	3.11
lm_scale_1.0_decoder_scale_0.1	3.11
lm_scale_1.0_decoder_scale_0.09	3.12
lm_scale_0.9_decoder_scale_0.01	3.14
lm_scale_1.0_decoder_scale_0.08	3.14
lm_scale_4.0_decoder_scale_4.0	3.14
lm_scale_1.0_decoder_scale_0.05	3.19
lm_scale_1.0_decoder_scale_0.03	3.25
lm_scale_6.0_decoder_scale_6.0	3.25
lm_scale_1.0_decoder_scale_0.01	3.3
lm_scale_8.0_decoder_scale_8.0	3.3
lm_scale_10.0_decoder_scale_10.0	3.36
lm_scale_2.0_decoder_scale_1.0	3.39
lm_scale_2.0_decoder_scale_0.9	3.5
lm_scale_10.0_decoder_scale_8.0	3.63
lm_scale_8.0_decoder_scale_6.0	3.67
lm_scale_2.0_decoder_scale_0.7	3.72
lm_scale_6.0_decoder_scale_4.0	3.76
lm_scale_4.0_decoder_scale_2.0	3.95
lm_scale_10.0_decoder_scale_6.0	3.99
lm_scale_2.0_decoder_scale_0.5	4.11
lm_scale_8.0_decoder_scale_4.0	4.18
lm_scale_10.0_decoder_scale_4.0	4.54
lm_scale_6.0_decoder_scale_2.0	4.67
lm_scale_2.0_decoder_scale_0.3	4.7
lm_scale_4.0_decoder_scale_1.0	4.85
lm_scale_4.0_decoder_scale_0.9	4.98
lm_scale_8.0_decoder_scale_2.0	5.18
lm_scale_4.0_decoder_scale_0.7	5.29
lm_scale_2.0_decoder_scale_0.1	5.33
lm_scale_2.0_decoder_scale_0.09	5.35
lm_scale_2.0_decoder_scale_0.08	5.38
lm_scale_2.0_decoder_scale_0.05	5.45
lm_scale_2.0_decoder_scale_0.03	5.48
lm_scale_10.0_decoder_scale_2.0	5.48
lm_scale_6.0_decoder_scale_1.0	5.5
lm_scale_2.0_decoder_scale_0.01	5.53
lm_scale_4.0_decoder_scale_0.5	5.6
lm_scale_6.0_decoder_scale_0.9	5.61
lm_scale_6.0_decoder_scale_0.7	5.79
lm_scale_8.0_decoder_scale_1.0	5.81
lm_scale_4.0_decoder_scale_0.3	5.86
lm_scale_8.0_decoder_scale_0.9	5.87
lm_scale_6.0_decoder_scale_0.5	5.94
lm_scale_10.0_decoder_scale_1.0	5.94
lm_scale_8.0_decoder_scale_0.7	5.97
lm_scale_10.0_decoder_scale_0.9	5.99
lm_scale_4.0_decoder_scale_0.1	6.04
lm_scale_4.0_decoder_scale_0.08	6.05
lm_scale_4.0_decoder_scale_0.09	6.05
lm_scale_4.0_decoder_scale_0.05	6.06
lm_scale_4.0_decoder_scale_0.03	6.08
lm_scale_6.0_decoder_scale_0.3	6.08
lm_scale_8.0_decoder_scale_0.5	6.08
lm_scale_10.0_decoder_scale_0.7	6.08
lm_scale_4.0_decoder_scale_0.01	6.09
lm_scale_10.0_decoder_scale_0.5	6.15
lm_scale_8.0_decoder_scale_0.3	6.16
lm_scale_6.0_decoder_scale_0.1	6.17
lm_scale_6.0_decoder_scale_0.08	6.17
lm_scale_6.0_decoder_scale_0.09	6.18
lm_scale_6.0_decoder_scale_0.01	6.19
lm_scale_6.0_decoder_scale_0.03	6.19
lm_scale_6.0_decoder_scale_0.05	6.19
lm_scale_10.0_decoder_scale_0.3	6.21
lm_scale_8.0_decoder_scale_0.1	6.23
lm_scale_8.0_decoder_scale_0.09	6.24
lm_scale_8.0_decoder_scale_0.01	6.25
lm_scale_8.0_decoder_scale_0.03	6.25
lm_scale_8.0_decoder_scale_0.05	6.25
lm_scale_8.0_decoder_scale_0.08	6.25
lm_scale_10.0_decoder_scale_0.1	6.26
lm_scale_10.0_decoder_scale_0.05	6.27
lm_scale_10.0_decoder_scale_0.08	6.27
lm_scale_10.0_decoder_scale_0.09	6.27
lm_scale_10.0_decoder_scale_0.01	6.28
lm_scale_10.0_decoder_scale_0.03	6.28

@danpovey
Copy link
Contributor

I just want to make sure you know how to get the unique token sequences from paths in the FSA. (Not sure if this is
something that needs fixing, sorry).
By unique token sequences I mean without the repeats that come from the CTC, topo, or the epsilons.
The way to do this is to use inner_labels='tokens' or something like that when doing the composition with the CTC
topo during graph construction, and then use fsa.tokens to obtain these from the lattices when you need them. Any other way may not be correct if we are using the new/simplified CTC topo, because any repeats of the same token will be converted into a single token, so certain words or word-sequences might become impossible to recognize.

@csukuangfj
Copy link
Collaborator

Here is the log when program crash while decoding test-other:

INFO:root:batch 1910, cuts processed until now is 1943/2939 (66.110922%)
INFO:root:batch 1920, cuts processed until now is 1953/2939 (66.451174%)
INFO:root:batch 1930, cuts processed until now is 1963/2939 (66.791426%)
[F] /ceph-ly/open-source/latest_k2/k2/k2/python/csrc/torch/torch_util.h:122:k2::Array1<U> k2::FromTorch(at::Tensor&) [with T = in
t] Check failed: tensor.strides()[0] == 1 (4 vs. 1) Expected stride: 1. Given: 4

[ Stack-Trace: ]
/ceph-ly/open-source/latest_k2/k2/build/lib/libk2_log.so(k2::internal::GetStackTrace()+0x5b) [0x7fd36a0f66ba]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x70a52) [0x7fd36b423a52]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0xb8c8f) [0x7fd36b46bc8f]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x1088bd) [0x7fd36b4bb8bd]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x102af4) [0x7fd36b4b5af4]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x11e695) [0x7fd36b4d1695]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x11db07) [0x7fd36b4d0b07]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x116d22) [0x7fd36b4c9d22]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x116f14) [0x7fd36b4c9f14]
/ceph-ly/open-source/latest_k2/k2/build/lib/_k2.cpython-38-x86_64-linux-gnu.so(+0x54187) [0x7fd36b407187]
python(PyCFunction_Call+0x56) [0x5ff8a6]
python(_PyObject_MakeTpCall+0x28f) [0x5fff6f]
python(_PyEval_EvalFrameDefault+0x5b9e) [0x57e35e]
python(_PyFunction_Vectorcall+0x19c) [0x602b2c]
python(PyVectorcall_Call+0x51) [0x5ff3b1]
/ceph-ly/py38/lib/python3.8/site-packages/torch/lib/libtorch_python.so(THPFunction_apply(_object*, _object*)+0x8fd) [0x7fd45ebdb7
8d]
python(PyCFunction_Call+0xfb) [0x5ff94b]
python(_PyObject_MakeTpCall+0x28f) [0x5fff6f]
python(_PyEval_EvalFrameDefault+0x5b9e) [0x57e35e]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x442) [0x602dd2]
python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]
python(_PyFunction_Vectorcall+0x19c) [0x602b2c]
python(_PyEval_EvalFrameDefault+0x53f0) [0x57dbb0]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x442) [0x602dd2]
python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]
python(_PyFunction_Vectorcall+0x19c) [0x602b2c]
python(PyVectorcall_Call+0x51) [0x5ff3b1]
python(_PyEval_EvalFrameDefault+0x1c4a) [0x57a40a]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x247) [0x602bd7]
python(_PyEval_EvalFrameDefault+0x619) [0x578dd9]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x442) [0x602dd2]
python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x442) [0x602dd2]
python(PyVectorcall_Call+0x51) [0x5ff3b1]
python(_PyEval_EvalFrameDefault+0x1c4a) [0x57a40a]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x442) [0x602dd2]
python(_PyEval_EvalFrameDefault+0x1930) [0x57a0f0]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python(_PyFunction_Vectorcall+0x247) [0x602bd7]
python(_PyEval_EvalFrameDefault+0x619) [0x578dd9]
python(_PyEval_EvalCodeWithName+0x25c) [0x5765ec]
python() [0x662c2e]
python(PyRun_FileExFlags+0x97) [0x662d07]
python(PyRun_SimpleFileExFlags+0x17f) [0x663a1f]

Traceback (most recent call last):
  File "bpe_ctc_att_conformer_decode.py", line 617, in <module>
  File "bpe_ctc_att_conformer_decode.py", line 576, in main
    model=model,
  File "/ceph-ly/py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "bpe_ctc_att_conformer_decode.py", line 278, in decode
    model=model,
  File "bpe_ctc_att_conformer_decode.py", line 240, in decode_one_batch
    lm_scale_list)
  File "/ceph-ly/py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/ceph-ly/open-source/to_submit/lattice_rescore_snwofall/snowfall/snowfall/decoding/lm_rescore.py", line 320, in rescore_w
ith_whole_lattice
    best_paths = k2.shortest_path(inv_lats, use_double_scores=True)
  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/fsa_algo.py", line 541, in shortest_path
    out_fsa = k2.utils.fsa_from_unary_function_tensor(fsa, ragged_arc, arc_map)
  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/utils.py", line 449, in fsa_from_unary_function_tensor
    setattr(dest, name, index_select(value, arc_map,
  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/ops.py", line 159, in index_select
    ans = _IndexSelectFunction.apply(src, index, default_value)
  File "/ceph-ly/open-source/latest_k2/k2/k2/python/k2/ops.py", line 65, in forward
    return _k2.index_select(src, index, default_value)
RuntimeError: Some bad things happed.

Did you use a batch size of 1? If your decoding result is an empty FSA, you will encounter this kind of error
when calling k2.shortest_path. The solution is to return rescoring_lats directly.

inv_lats = k2.invert(rescoring_lats)

The reason is that the following line
https://github.com/k2-fsa/k2/blob/069425e301472e7ea31ea982ba2a943ac5fcb649/k2/python/k2/fsa.py#L894

            if src_name == 'labels':
                value = value.clone()

returns a tensor with stride == 4 if value is empty.

@danpovey
Copy link
Contributor

We should modify the code that crashes to be insensitive to the stride if any of the dims is zero. Kangwei, perhaps you could do that?

@pkufool
Copy link
Contributor

pkufool commented Jul 12, 2021

We should modify the code that crashes to be insensitive to the stride if any of the dims is zero. Kangwei, perhaps you could do that?

Sure.

@glynpu
Copy link
Contributor Author

glynpu commented Jul 12, 2021

I just want to make sure you know how to get the unique token sequences from paths in the FSA. (Not sure if this is
something that needs fixing, sorry).

After removing repeat tokens and use log_semiring=False, wer on test-clean decrease from 2.81(last week) to 2.73(now).

details result with different scale combination:

decoder_scale(right)lm_scale(below) 0.1 0.3 0.5 0.6 0.7 0.9 1.0 1.1 1.2 1.3 1.5 1.7 1.9 2.0
0.1 2.98 2.95 2.92 2.9 2.9 2.89 2.89 2.88 2.87 2.86 2.85 2.85 2.85 2.84
0.3 2.91 2.88 2.88 2.88 2.87 2.87 2.85 2.85 2.85 2.85 2.84 2.84 2.83 2.83
0.5 2.88 2.86 2.83 2.84 2.84 2.84 2.83 2.84 2.83 2.82 2.82 2.83 2.83 2.83
0.6 2.86 2.82 2.82 2.81 2.82 2.82 2.82 2.82 2.82 2.81 2.81 2.82 2.82 2.82
0.7 2.87 2.8 2.78 2.79 2.8 2.81 2.81 2.8 2.8 2.8 2.8 2.82 2.82 2.82
0.9 2.99 2.84 2.78 2.76 2.77 2.76 2.76 2.76 2.77 2.78 2.79 2.79 2.8 2.8
1.0 3.12 2.89 2.8 2.77 2.77 2.75 2.74 2.74 2.76 2.77 2.78 2.79 2.79 2.79
1.1 3.32 3.0 2.82 2.8 2.77 2.74 2.73 2.74 2.73 2.74 2.77 2.78 2.78 2.78
1.2 3.58 3.13 2.9 2.85 2.8 2.77 2.74 2.74 2.73 2.74 2.73 2.76 2.77 2.77
1.3 3.87 3.3 3.0 2.92 2.87 2.79 2.76 2.77 2.75 2.74 2.74 2.74 2.75 2.76
1.5 4.45 3.78 3.28 3.17 3.03 2.88 2.85 2.82 2.78 2.77 2.74 2.73 2.74 2.73
1.7 4.84 4.24 3.76 3.54 3.31 3.06 2.99 2.93 2.88 2.84 2.8 2.77 2.75 2.75
1.9 5.11 4.65 4.15 3.95 3.73 3.33 3.2 3.12 3.03 2.98 2.88 2.84 2.8 2.79
2.0 5.19 4.81 4.37 4.11 3.92 3.54 3.34 3.23 3.13 3.05 2.95 2.88 2.83 2.81

@glynpu
Copy link
Contributor Author

glynpu commented Jul 13, 2021

Result of batch_size > 1 is a little than that of batch_size == 1, with 2.74 > 2.73.
And the lowest wer is obtained with different lm_scale/decoder_scale setting.

Detail results:

decoder_scale(right)lm_scale(below) 0.1 0.3 0.5 0.6 0.7 0.9 1.0 1.1 1.2 1.3 1.5 1.7 1.9 2.0
0.1 2.99 2.98 2.94 2.92 2.92 2.92 2.91 2.91 2.9 2.9 2.89 2.89 2.89 2.89
0.3 2.9 2.9 2.9 2.9 2.9 2.89 2.88 2.87 2.88 2.88 2.86 2.86 2.86 2.86
0.5 2.88 2.85 2.85 2.87 2.86 2.85 2.85 2.86 2.85 2.85 2.85 2.85 2.86 2.86
0.6 2.86 2.83 2.82 2.82 2.84 2.84 2.84 2.84 2.85 2.85 2.85 2.85 2.86 2.86
0.7 2.86 2.81 2.79 2.8 2.81 2.83 2.83 2.83 2.84 2.84 2.85 2.86 2.85 2.85
0.9 2.98 2.84 2.79 2.76 2.77 2.78 2.78 2.8 2.81 2.82 2.82 2.82 2.83 2.84
1.0 3.12 2.88 2.81 2.79 2.77 2.76 2.76 2.78 2.79 2.81 2.82 2.81 2.82 2.82
1.1 3.31 3.0 2.83 2.81 2.79 2.76 2.75 2.75 2.75 2.77 2.8 2.8 2.81 2.81
1.2 3.59 3.13 2.9 2.85 2.81 2.79 2.77 2.76 2.75 2.76 2.76 2.79 2.8 2.8
1.3 3.87 3.3 3.01 2.93 2.87 2.79 2.78 2.79 2.77 2.76 2.76 2.77 2.78 2.79
1.5 4.43 3.81 3.29 3.17 3.05 2.9 2.87 2.84 2.8 2.78 2.77 2.74 2.75 2.75
1.7 4.86 4.28 3.79 3.56 3.32 3.07 3.0 2.95 2.89 2.87 2.82 2.79 2.78 2.77
1.9 5.15 4.68 4.17 3.96 3.74 3.33 3.21 3.13 3.04 2.99 2.88 2.85 2.82 2.81
2.0 5.22 4.83 4.37 4.13 3.92 3.55 3.34 3.24 3.14 3.07 2.95 2.87 2.84 2.82

@glynpu
Copy link
Contributor Author

glynpu commented Jul 13, 2021

As suggested by fangjun, the crash when decoding test-other is solved by batch_size > 1.
Current results are:

  Wer% on test_clean wer% on test_other
Encoder + ctc 3.32 7.96
Encoder + (ctc + 3-gram) + 4-gram lattice rescore 2.92 *(to be tested)
Encoder + (ctc + 3-gram) + 4-gram lattice rescore + (transformer decoder n-best rescore) num-paths-for-decoder-rescore=100 2.87 *(to be tested)
Encoder + (ctc + 3-gram) + 4-gram lattice rescore + (transformer decoder n-best rescore) num-paths-for-decoder-rescore=500 2.86 *(to be tested)
+log_semering=False and remove repeated tokens 2.73 6.11

@danpovey
Copy link
Contributor

danpovey commented Jul 13, 2021 via email

@glynpu
Copy link
Contributor Author

glynpu commented Jul 21, 2021

A better model is obained with following modifications:

  feat-norm learning-factor warm-up steps epoch
before no 10 40,000 40 epoch (avg=10, with 26-35 epoch)
current yes 5 80,000(around 10 epochs) 50 epochs (avg=20 with 31-50 epochs)

detail wer on test-clean:

  before current
Encoder + ctc 3.32 2.98( wer of espnet released model is 2.97/3.00)
Encoder + TLG + 4-gram lattice rescore + nbest rescore with transformer decoder with log_semering=False and remove repeated tokens 2.73 2.54

result with diffrernt combination of decoder_scale and lm_scale
wer=2.54 is obtained with decoder_scale = 1.7 and lm_scale=1.7

decoder_scale(right)lm_scale(below) 0.1 0.3 0.5 0.6 0.7 0.9 1.0 1.1 1.2 1.3 1.5 1.7 1.9 2.0 2.1 2.2 2.3 2.4 2.5
0.1 2.81 2.78 2.75 2.75 2.74 2.74 2.73 2.73 2.73 2.72 2.72 2.71 2.71 2.7 2.69 2.7 2.7 2.7 2.7
0.3 2.75 2.72 2.7 2.69 2.68 2.68 2.69 2.69 2.69 2.68 2.68 2.67 2.67 2.66 2.66 2.67 2.66 2.66 2.66
0.5 2.7 2.66 2.67 2.66 2.67 2.66 2.65 2.66 2.66 2.65 2.64 2.64 2.64 2.63 2.63 2.63 2.63 2.63 2.63
0.6 2.68 2.66 2.64 2.64 2.63 2.65 2.65 2.65 2.65 2.64 2.63 2.63 2.63 2.62 2.61 2.61 2.61 2.62 2.62
0.7 2.67 2.63 2.62 2.63 2.62 2.63 2.64 2.63 2.64 2.64 2.64 2.62 2.61 2.61 2.61 2.61 2.61 2.62 2.61
0.9 2.73 2.61 2.6 2.6 2.61 2.61 2.62 2.61 2.61 2.61 2.61 2.6 2.61 2.61 2.62 2.61 2.61 2.61 2.61
1.0 2.85 2.65 2.59 2.59 2.6 2.6 2.59 2.6 2.59 2.59 2.59 2.61 2.6 2.61 2.61 2.61 2.61 2.61 2.61
1.1 3.04 2.71 2.62 2.59 2.59 2.6 2.6 2.6 2.58 2.59 2.59 2.59 2.59 2.59 2.6 2.6 2.6 2.61 2.6
1.2 3.31 2.86 2.65 2.62 2.59 2.58 2.57 2.58 2.58 2.58 2.58 2.59 2.59 2.59 2.58 2.58 2.59 2.6 2.6
1.3 3.52 3.04 2.75 2.66 2.62 2.57 2.57 2.56 2.56 2.57 2.57 2.58 2.59 2.59 2.59 2.59 2.58 2.58 2.58
1.5 4.0 3.47 3.06 2.89 2.8 2.64 2.6 2.58 2.59 2.56 2.56 2.55 2.56 2.56 2.57 2.58 2.58 2.58 2.59
1.7 4.41 3.87 3.43 3.26 3.07 2.83 2.74 2.67 2.64 2.6 2.58 2.54 2.56 2.55 2.55 2.55 2.55 2.57 2.57
1.9 4.64 4.26 3.8 3.61 3.41 3.12 2.99 2.86 2.79 2.73 2.64 2.57 2.56 2.54 2.56 2.56 2.55 2.55 2.55
2.0 4.72 4.38 3.98 3.77 3.59 3.29 3.13 3.01 2.88 2.81 2.68 2.62 2.57 2.56 2.56 2.55 2.56 2.56 2.55

@danpovey
Copy link
Contributor

Great!!

# fgram means four-gram
fgram_rescored_lattices = rescore_with_whole_lattice(lattices, G,
lm_scale_list=None,
need_rescored_lats=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should return here when fgram_rescored_lattices is empty.

if fgram_rescored_lattices.num_arcs == 0:
        return dict()

I fix the crash when running in batch size equals to one, see k2-fsa/k2#782 .
But it still has some problems when running transformer decoder with an empty input.

@Alex-Songs
Copy link

Hi glynpu:
This is a very cool work, is there a recipe to reproduce your results?
Thanks!
@glynpu

@glynpu
Copy link
Contributor Author

glynpu commented Jul 29, 2021

This is a very cool work, is there a recipe to reproduce your results?

Current pr is mainly about decoding part.
And #219 is about corresponding training part. Follow egs/librispeech/asr/simple_v1/bpe_run.sh in #219 and run stage0 and stage 1 you will reproduce my work. @Alex-Songs

@Alex-Songs
Copy link

thanks! @glynpu

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants