Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CTC training #3

Merged
merged 26 commits into from
Jul 31, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
71c4e29
Add style check tools.
csukuangfj Jul 15, 2021
d146a4e
Remove mypy.
csukuangfj Jul 15, 2021
40eed74
Download LM for LibriSpeech.
csukuangfj Jul 15, 2021
0b19aa0
Compute features of librispeech and musan.
csukuangfj Jul 19, 2021
f25eedf
Fixes after review.
csukuangfj Jul 19, 2021
e005ea0
Minor fixes after review.
csukuangfj Jul 20, 2021
d5e0408
Add prepare_lang.py based on prepare_lang.sh
csukuangfj Jul 20, 2021
8a72901
Minor fixes.
csukuangfj Jul 20, 2021
a01d08f
Add self-loops to propagate disambiguation symbols.
csukuangfj Jul 21, 2021
f3542c7
Add CTC training.
csukuangfj Jul 24, 2021
2e33e24
Add CI test.
csukuangfj Jul 24, 2021
ee83a3e
Fix CI dependencies installation.
csukuangfj Jul 24, 2021
5443618
Fix CI.
csukuangfj Jul 24, 2021
a909592
Fix CI test errors.
csukuangfj Jul 24, 2021
00f8371
begin to add LM rescoring.
csukuangfj Jul 24, 2021
6f9fe5b
Refactor decoding code.
csukuangfj Jul 24, 2021
4a66712
Add LM rescoring.
csukuangfj Jul 25, 2021
8055bf3
Support DDP training.
csukuangfj Jul 25, 2021
78bb65e
Fix an error in DDP training.
csukuangfj Jul 25, 2021
d3101fb
Fix loading checkpoint in DDP training.
csukuangfj Jul 26, 2021
4ccae50
WIP: Begin to add BPE decoding
csukuangfj Jul 26, 2021
f65854c
Add BPE decoding results.
csukuangfj Jul 27, 2021
bd69e4b
Use attention decoder for rescoring.
csukuangfj Jul 28, 2021
acc63a9
WIP: Add BPE training code.
csukuangfj Jul 29, 2021
b94d97d
Disable gradient computation in evaluation mode.
csukuangfj Jul 29, 2021
398ed80
Minor fixes to support DDP training.
csukuangfj Jul 31, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 54 additions & 9 deletions egs/librispeech/ASR/local/prepare_lang.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,13 @@
lexicon = k2.Fsa.from_dict(d)

5. Generate L_disambig.pt, in k2 format.

6. Generate lexicon_disambig.txt
"""
import math
import re
import sys
from collections import defaultdict
from pathlib import Path
from typing import Dict, List, Tuple
from typing import Any, Dict, List, Tuple

import k2
import torch
Expand Down Expand Up @@ -90,6 +88,10 @@ def write_lexicon(filename: str, lexicon: Lexicon) -> None:
def write_mapping(filename: str, sym2id: Dict[str, int]) -> None:
"""Write a symbol to ID mapping to a file.

Note:
No need to implement `read_mapping` as it can be done
through :func:`k2.SymbolTable.from_file`.

Args:
filename:
Filename to save the mapping.
Expand Down Expand Up @@ -119,7 +121,7 @@ def get_phones(lexicon: Lexicon) -> List[str]:
return sorted_ans


def get_words(lexicon: List[Tuple[str, List[str]]]) -> List[str]:
def get_words(lexicon: Lexicon) -> List[str]:
"""Get words from a lexicon.

Args:
Expand Down Expand Up @@ -213,12 +215,46 @@ def generate_id_map(symbols: List[str]) -> Dict[str, int]:
return {sym: i for i, sym in enumerate(symbols)}


def add_self_loops(
arcs: List[List[Any]], disambig_phone: int, disambig_word: int
) -> List[List[Any]]:
"""Adds self-loops to states of an FST to propagate disambiguation symbols
through it. They are added on each state with non-epsilon output symbols
on at least one arc out of the state.

See also fstaddselfloops.pl from Kaldi. One difference is that
Kaldi uses OpenFst style FSTs and it has multiple final states.
This function uses k2 style FSTs and it does not need to add self-loops
to the final state.

Args:
arcs:
A list-of-list. The sublist contains
`[src_state, dest_state, label, aux_label, score]`

Return:
Return new `arcs` that contain self-loops.
"""
states_needs_self_loops = set()
for arc in arcs:
src, dst, ilable, olable, score = arc
if olable != 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lable -> label

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed.

states_needs_self_loops.add(src)

ans = []
for s in states_needs_self_loops:
ans.append([s, s, disambig_phone, disambig_word, 0])

return arcs + ans


def lexicon_to_fst(
lexicon: Lexicon,
phone2id: Dict[str, int],
word2id: Dict[str, int],
sil_phone: str = "SIL",
sil_prob: float = 0.5,
need_self_loops: bool = False,
) -> k2.Fsa:
"""Convert a lexicon to an FST (in k2 format) with optional silence at
the beginning and end of the word.
Expand All @@ -235,6 +271,9 @@ def lexicon_to_fst(
sil_prob:
The probability for adding a silence at the beginning and end
of the word.
need_self_loops:
If True, add self-loop to states with non-epsilon output symbols
on at least one arc out of the state.
Returns:
Return an instance of `k2.Fsa` representing the given lexicon.
"""
Expand Down Expand Up @@ -285,6 +324,15 @@ def lexicon_to_fst(
arcs.append([cur_state, loop_state, prons[i], w, no_sil_score])
arcs.append([cur_state, sil_state, prons[i], w, sil_score])

if need_self_loops:
disambig_phone = phone2id["#0"]
disambig_word = word2id["#0"]
arcs = add_self_loops(
arcs,
disambig_phone=disambig_phone,
disambig_word=disambig_word,
)

final_state = next_state
arcs.append([loop_state, final_state, -1, -1, 0])
arcs.append([final_state])
Expand Down Expand Up @@ -346,13 +394,10 @@ def main():
word2id=word2id,
sil_phone=sil_phone,
sil_prob=sil_prob,
need_self_loops=True,
)

# TODO(fangjun): add self-loops to L_disambig
# whose ilabel is phone2id['#0'] and olable is word2id['#0']
# Need to implement it in k2

if True:
if False:
# Just for debugging, will remove it
torch.save(L.as_dict(), out_dir / "L.pt")
torch.save(L_disambig.as_dict(), out_dir / "L_disambig.pt")
Expand Down
12 changes: 12 additions & 0 deletions egs/librispeech/ASR/prepare.sh
Original file line number Diff line number Diff line change
Expand Up @@ -75,3 +75,15 @@ if [ $stage -le 4 ] && [ $stop_stage -ge 4 ]; then
mkdir -p data/fbank
./local/compute_fbank_musan.py
fi

if [ $stage -le 5 ] && [ $stop_stage -ge 5 ]; then
echo "Stage 5: Prepare phone based lang"
# TODO: add BPE based lang
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally, something I want to try (I was working on this in Snowfall), is to generate a BPE lexicon parallel to the phone-based lexicon, and generate a dual lexicon that contains both phones (with disambig symbols) and BPE symbols. So we could call this a "dual lang directory". It would have words.txt, phones.txt, bpe.txt.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. then, I was thinking, we could train on both phone and BPE symbols-- perhaps alternating them on different minibatches, if time is a concern. We can even decode like this, by manipulating/rescoring lattices.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool idea.

mkdir -p data/lang

(echo '!SIL SIL'; echo '<SPOKEN_NOISE> SPN'; echo '<UNK> SPN'; ) |
cat - data/lm/librispeech-lexicon.txt |
sort | uniq > data/lang/lexicon.txt

./local/prepare_lang.py
fi