Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semi-supervised training using chain models #15

Open
wants to merge 291 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
291 commits
Select commit Hold shift + click to select a range
2c43456
chain-smbr: Bug fixes
vimalmanohar Jun 22, 2017
6adc948
Chain SMBR fixes
vimalmanohar Jun 22, 2017
2959279
chain-smbr: Bug fixes
vimalmanohar Jun 22, 2017
51ec051
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Jun 22, 2017
758e9a4
chain-smbr: Bug fix
vimalmanohar Jun 22, 2017
d364040
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Jun 23, 2017
2f15292
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Jun 24, 2017
d8db02d
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Jun 25, 2017
9d97243
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Jun 27, 2017
57d1016
temp
vimalmanohar Jun 22, 2017
a03b401
smbr-dash
vimalmanohar Jun 22, 2017
0682618
smbr without leaky
vimalmanohar Jun 24, 2017
62da39a
chain-smbr: Fix bugs in chain smbr
vimalmanohar Jun 27, 2017
5b7879d
smbr training
vimalmanohar Jun 27, 2017
378267b
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Jun 28, 2017
a973632
Adding missing chain-smbr-kernels.cu
vimalmanohar Jun 29, 2017
e7d9d52
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Jun 29, 2017
55d3321
Add phone-insertion-penalty + minor updates
hhadian Jun 29, 2017
0a19c27
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Jun 30, 2017
f776b3a
Minor bug fixes
vimalmanohar Jun 30, 2017
d1b872c
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Jul 1, 2017
c11756d
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Jul 4, 2017
8fd9f19
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Jul 7, 2017
f37c374
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Jul 8, 2017
a89d02d
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Jul 9, 2017
4c86384
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Jul 10, 2017
774d78e
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Jul 12, 2017
845f27b
chain-smbr: Adding smbr
vimalmanohar Jul 12, 2017
545154a
added scripts for new weight transfer method for transferring all lay…
pegahgh Jul 14, 2017
5248c1a
merged with master
pegahgh Jul 14, 2017
40c85dc
updated PR w.r.t comments.
pegahgh Jul 14, 2017
39a731f
small fix to parser.py.
pegahgh Jul 14, 2017
970842e
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Jul 15, 2017
c1996ff
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Jul 17, 2017
72480ec
fixed issues w.r.t. comments (except prepare_wsj_rm_lang.sh).
pegahgh Jul 17, 2017
7559d3a
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Jul 18, 2017
e0d43a6
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Jul 20, 2017
4a217ea
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Jul 22, 2017
48d8161
chain: Fixes for silence
vimalmanohar Jul 23, 2017
9fedda9
chain: Updating chain script
vimalmanohar Jul 23, 2017
de34ec4
Merging masterR
vimalmanohar Jul 23, 2017
e51826a
fixed small issue with language-model.*.
pegahgh Jul 29, 2017
d64d017
semisup: Updating semisupervised scripts
vimalmanohar Aug 4, 2017
0a6b824
added new Xconfig layer to parse existing model and modified run_tdnn…
pegahgh Aug 6, 2017
e830a04
modified scripts to accept --trainer.input-model and prepare *.fst ou…
pegahgh Aug 9, 2017
49bcf2e
removed changes to language-model.* and generated weighted phone lm u…
pegahgh Aug 10, 2017
d25e63a
optimized alignment processing stage in weighted phone lm generation.
pegahgh Aug 10, 2017
f2d01ae
added check to have possitive int as phone lm weights.
pegahgh Aug 10, 2017
293c531
fixed small issue with train_dnn.py.
pegahgh Aug 10, 2017
2462cf5
merged with kaldi/master.
pegahgh Aug 10, 2017
5b510f9
fixed some issues.
pegahgh Aug 11, 2017
ac95720
fixed some issues.
pegahgh Aug 15, 2017
ed8b952
fixed some comments and removed some options.
pegahgh Aug 17, 2017
b92a63a
semisup: Adding some extra script for semi-supervised recipes
vimalmanohar Aug 17, 2017
7a9ef54
fixed src dirs options for transfer learning scripts 1{a,b,c} and mod…
pegahgh Aug 17, 2017
4d8ec90
semisup: Merging from master
vimalmanohar Aug 18, 2017
775b34d
minor change to prepare for tf learning
vimalmanohar Aug 23, 2017
a2d5e62
semisup: Merging transfer learning
vimalmanohar Aug 23, 2017
e0fd23e
semisup: Separate tolerance for silence
vimalmanohar Aug 23, 2017
405af6c
Merge branch 'chain-smbr' of github.com:vimalmanohar/kaldi into semis…
vimalmanohar Aug 23, 2017
89e574b
modified comments in xconfig and train.py and modified scripts to gen…
pegahgh Aug 24, 2017
eb00983
small fix.
pegahgh Aug 24, 2017
ef7275b
fixed old comments and added new comments.
pegahgh Aug 24, 2017
82fa510
fixed some issues in python codes using pylint package.
pegahgh Aug 24, 2017
40dc5e4
smbr: Fix aux objf
vimalmanohar Aug 24, 2017
bd20bdf
semisup: Merge chain-smbr
vimalmanohar Aug 24, 2017
1a74866
semisup: Merge chain-smbr
vimalmanohar Aug 24, 2017
a856dea
Update parser.py
pegahgh Aug 26, 2017
55a64ff
Update run_tdnn_wsj_rm_1c.sh
pegahgh Aug 30, 2017
c2593d8
Update basic_layers.py
pegahgh Aug 30, 2017
26b4ddd
Update parser.py
pegahgh Aug 30, 2017
90fc04a
chain: objective function fixes
vimalmanohar Sep 1, 2017
d811e15
semisup: Minor fixes to chain semisup
vimalmanohar Sep 1, 2017
af050b6
semisup: Add more recipes
vimalmanohar Sep 1, 2017
82daf84
Update xconfig_to_configs.py
vimalmanohar Sep 2, 2017
f6bea67
semisup: Merging transfer learning
vimalmanohar Sep 2, 2017
f88a115
Merge pull request #12 from vimalmanohar/patch-4
pegahgh Sep 2, 2017
ed63b19
Update make_weighted_den_fst.sh
vimalmanohar Sep 3, 2017
43d1fe2
Merge pull request #13 from vimalmanohar/patch-5
pegahgh Sep 4, 2017
125abf0
fixed small issues.
pegahgh Sep 6, 2017
f51492b
fixed small issue.
pegahgh Sep 6, 2017
ba308ea
modified make_weighted_den_fst.sh
pegahgh Sep 10, 2017
8fae871
modified weighted_den_fst.sh
pegahgh Sep 10, 2017
6f5e8eb
fixed some issues.
pegahgh Sep 12, 2017
3985924
fixed some small issues.
pegahgh Sep 12, 2017
17bb56f
Merge branch 'master' into transfer-learning-wsj-rm
danpovey Sep 13, 2017
fe07c0b
[scripts] Cosmetic and other improvements to make_weighted_den_fst.sh…
danpovey Sep 13, 2017
b5ce647
smbr: Logging bug fix
vimalmanohar Sep 13, 2017
967531d
semisup: Extend trivial output layer
vimalmanohar Sep 13, 2017
e5e57ee
temp fix
vimalmanohar Sep 13, 2017
9ff681a
Merging from transfer learning
vimalmanohar Sep 13, 2017
a34655c
Merge branch 'transfer_learning' of github.com:danpovey/kaldi into se…
vimalmanohar Sep 13, 2017
d61cb4b
semisup: Adding lattice splitting chain code
vimalmanohar Sep 25, 2017
8772dba
semisup: Adding tolerances to lattices
vimalmanohar Oct 3, 2017
339c435
Old tolerance approach
vimalmanohar Oct 11, 2017
e90ca23
semisup: adding mbr supervision
vimalmanohar Oct 16, 2017
ea6ed69
semisup: Adding semisup recipes
vimalmanohar Oct 16, 2017
bacca8b
Minor bug fix in get_egs.sh
vimalmanohar Oct 17, 2017
417ecfd
Best path system recipe
vimalmanohar Oct 17, 2017
6f0de80
Add some minor check
vimalmanohar Oct 18, 2017
c6aa0e4
Updates to work with RNNLM
vimalmanohar Oct 19, 2017
c22bd48
Fix tolerance fst
vimalmanohar Oct 20, 2017
0d8af58
Minor fix to _m
Oct 20, 2017
f0c9fe1
Merge branch 'semisup-smbr' of github.com:vimalmanohar/kaldi into sem…
Oct 20, 2017
5bfdd39
Tolerance fst fixed
vimalmanohar Oct 22, 2017
37cafe8
Merge branch 'semisup-smbr' of github.com:vimalmanohar/kaldi into sem…
Oct 22, 2017
479e769
semisup: Fixing some bugs and making cleaner scripts
Oct 27, 2017
a3c3703
minor changes
vimalmanohar Oct 27, 2017
90e88ba
Merge branch 'semisup-smbr' of github.com:vimalmanohar/kaldi into sem…
vimalmanohar Oct 27, 2017
bf10730
semisup: Changes to get_egs
Oct 27, 2017
18093ae
Merge branch 'semisup-smbr' of github.com:vimalmanohar/kaldi into sem…
vimalmanohar Oct 27, 2017
0bbd2ce
semisup: Adding 100k experiments
Oct 29, 2017
99b8fc1
Merge branch 'semisup-smbr' of github.com:vimalmanohar/kaldi into sem…
vimalmanohar Oct 29, 2017
f392d74
Changed permissions
vimalmanohar Oct 30, 2017
05ba2d9
Binaries for undeterminized lattices
Nov 2, 2017
fcefeaa
semisup: Adding tfrnnlm scripts
vimalmanohar Nov 2, 2017
a0572b5
semisup: Undeterminized lattices recipes
Nov 6, 2017
8a035ab
semisup-smbr: Bug fix in 15k_s
vimalmanohar Nov 6, 2017
155b90a
Undo _s changes
vimalmanohar Nov 6, 2017
34f780a
semisup-smbr: Adding undeterminized version of rescoring
Nov 6, 2017
62b0f3b
Merge branch 'semisup-smbr' of github.com:vimalmanohar/kaldi into sem…
vimalmanohar Nov 6, 2017
0651075
semisup-smbr: Fix undeterminized lattice rescoring
Nov 6, 2017
35afc06
Merge branch 'semisup-smbr' of github.com:vimalmanohar/kaldi into sem…
vimalmanohar Nov 6, 2017
eadc843
semisup: 50 hours recipe
vimalmanohar Nov 12, 2017
c5acc17
semisup: Pocolm for fisher english
vimalmanohar Nov 14, 2017
f71741a
semisup: Fix lattice rescoring
Nov 17, 2017
5103952
semisup: Code changes for undeterminized lattices
Nov 17, 2017
fc472c3
semisup: Adding more recipes
Nov 17, 2017
010bc4e
semisup: Unk model on Fisher
vimalmanohar Nov 17, 2017
d43125b
semisup: Bug fix in ivectors in semi-supervised scenario
vimalmanohar Nov 17, 2017
82efedb
semisup: Minor fixes to scripts
vimalmanohar Nov 20, 2017
e3b7d72
semisup-smbr: Re-organizing stuff
vimalmanohar Nov 28, 2017
76cc0a0
semisup-smbr: Adding more recipes
vimalmanohar Nov 28, 2017
47ab45a
semisup-smbr: Add stages to scoring scripts
vimalmanohar Nov 28, 2017
37bb897
semisup: unk model script
vimalmanohar Dec 1, 2017
42e9065
semisup-smbr: Add more recipes with UNK model
vimalmanohar Dec 6, 2017
df09133
SWBD stats pooling VAD recipe
Jan 9, 2018
b9c7161
Add SWBD VAD recipe
GoVivace Jan 9, 2018
36747c4
path.sh convention and comments update
GoVivace Jan 11, 2018
6390477
add options for noise and reverberations
GoVivace Jan 12, 2018
b62c2a8
Fix bugs in evaluations part
GoVivace Jan 16, 2018
0e10018
semisup-smbr: add one-silence-class and exclude-silence options for c…
vimalmanohar Jan 22, 2018
803a576
semisup-smbr: Some script level changes for smbr
vimalmanohar Jan 22, 2018
2839e2e
semisup: Add num-copies option for combine_egs.sh
vimalmanohar Jan 22, 2018
63858b8
semisup-smbr: Add more run-level scripts
vimalmanohar Jan 22, 2018
5e7ca16
semisup-smbr: Choose egs script inside train.py
vimalmanohar Jan 24, 2018
63c47b1
semisup-smbr: Update aspire recipe
vimalmanohar Jan 24, 2018
bea828d
semisup-smbr: Merging from master
vimalmanohar Jan 24, 2018
e8fc4f7
Minor fixes
vimalmanohar Jan 24, 2018
e4e2145
t Merge branch 'master' of github.com:kaldi-asr/kaldi into swbd_stats…
vimalmanohar Jan 25, 2018
ce3cbab
Simplifying recipe
vimalmanohar Jan 25, 2018
b43e5dc
simplifying stuff
vimalmanohar Jan 26, 2018
d3c11fb
Modifying the way mmi and smbr are done in lattice-free
vimalmanohar Feb 2, 2018
07f636b
Merge branch 'semisup-smbr' of github.com:vimalmanohar/kaldi into asp…
vimalmanohar Feb 2, 2018
a1224ee
Minor bug fix
vimalmanohar Feb 5, 2018
0ac68b3
Merge branch 'swbd_stats_vad_recipe' of github.com:vimalmanohar/kaldi…
vimalmanohar Feb 5, 2018
9b809b9
Merging from golden master
vimalmanohar Feb 13, 2018
715f219
added new functions to accept NnetExample in nnet-chain-training.cc.
pegahgh Feb 13, 2018
474e865
LF-SMBR training
vimalmanohar Feb 20, 2018
1dbb317
merging from master
vimalmanohar Feb 20, 2018
91889b1
Aspire changes
vimalmanohar Feb 21, 2018
ae22eec
fixed issues w.r.t comments (part 1).
pegahgh Feb 22, 2018
7ac09a3
Adding ML and separate MMI factors
vimalmanohar Feb 26, 2018
f94738f
modfied functions to accept new sort (sort by t and then n) in nnet3-…
pegahgh Feb 26, 2018
40fa154
fixed some issues.
pegahgh Feb 27, 2018
6437c04
chain-smbr: Minor bug fixes
vimalmanohar Mar 1, 2018
7a1389f
Changes for ts learning
vimalmanohar Mar 1, 2018
1c434f7
Merging from master
vimalmanohar Mar 1, 2018
70b4d88
Changes related to python3
vimalmanohar Mar 7, 2018
addb032
Fix subsampling factor in nnet3 egs
vimalmanohar Mar 8, 2018
9d5f562
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Mar 9, 2018
0fce78b
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Mar 10, 2018
4c7380c
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Mar 12, 2018
1f024bd
Fix for segmentation
vimalmanohar Mar 12, 2018
ad3fe8f
ts learning from post
vimalmanohar Mar 13, 2018
a511b0f
update aspire sad
vimalmanohar Mar 13, 2018
a11edcc
Merging segmentation fix
vimalmanohar Mar 13, 2018
be6b95a
Support multiple smbr-factors for outputs
vimalmanohar Mar 13, 2018
0a7b35a
Merging kaldi master
vimalmanohar Mar 13, 2018
b240144
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Mar 14, 2018
c4b0f05
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Mar 15, 2018
a5799c9
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Mar 16, 2018
49de351
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Mar 17, 2018
50df8ba
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Mar 18, 2018
54305b4
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Mar 19, 2018
bf0ee42
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Mar 20, 2018
e5db432
Re-organize sequence kl training
vimalmanohar Mar 20, 2018
0817ffd
Changes for sequence KL training
vimalmanohar Mar 20, 2018
126b86d
Merge semisup-smbr
vimalmanohar Mar 20, 2018
94dc65a
Multiple smbr factors for outputs
vimalmanohar Mar 20, 2018
af1712f
merging smbr factors
vimalmanohar Mar 20, 2018
a2f2c10
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Mar 21, 2018
4cb144c
adding kl factors
vimalmanohar Mar 21, 2018
7fc5b5a
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Mar 22, 2018
de002de
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Mar 23, 2018
966ae60
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Mar 25, 2018
534692a
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Mar 27, 2018
b375f2a
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Mar 28, 2018
b94b30e
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Mar 29, 2018
e8b4f50
Merge branch 'master' of github.com:kaldi-asr/kaldi
vimalmanohar Mar 31, 2018
b596a4c
Merging code from master
vimalmanohar Mar 31, 2018
bff2dc8
TS changes
vimalmanohar Mar 31, 2018
e75db2b
Merging semisup-smbr
vimalmanohar Apr 1, 2018
d9b8949
Fix bugs
vimalmanohar Apr 2, 2018
a7144a8
Bug fix
vimalmanohar Apr 2, 2018
88c03ec
Minor bug fixes
vimalmanohar Apr 2, 2018
cf40932
Merging KL
vimalmanohar Apr 2, 2018
dfb891f
Minor bug fixes
vimalmanohar Apr 2, 2018
975c130
Fix issues related to egs
vimalmanohar Apr 3, 2018
cbd87b7
Changes to KL training
vimalmanohar Apr 11, 2018
173f0c6
Minor fix
vimalmanohar Apr 18, 2018
90518d7
Adding graph post
vimalmanohar Apr 23, 2018
2cf2acb
kl-latdir fix log dir
vimalmanohar Apr 23, 2018
6564ff1
semisup: Bug fix in context info
vimalmanohar Apr 25, 2018
ab65a20
Adding unconstrained egs
vimalmanohar Apr 27, 2018
b3ff9cb
Merge branch 'semisup-smbr' of github.com:vimalmanohar/kaldi into sem…
vimalmanohar Apr 27, 2018
d0caae4
Adding new recipes
vimalmanohar Apr 27, 2018
5b0af2e
Minor fix
vimalmanohar May 1, 2018
3f7d29a
Merge branch 'semisup-smbr' of github.com:vimalmanohar/kaldi into sem…
vimalmanohar May 1, 2018
ed30167
Add subsplit support
May 2, 2018
2ec7b4a
Merge branch 'semisup-smbr' of https://github.com/vimalmanohar/kaldi …
May 2, 2018
f2d3994
Add sub_split option to rescoring
vimalmanohar May 3, 2018
34379bd
Merging master
vimalmanohar Jun 5, 2018
fcc156f
Fixing smbr recipe
vimalmanohar Jun 5, 2018
901646d
Merging from golden master
vimalmanohar Jun 5, 2018
9ab70b2
Adding semisup prep script for multi_en
Jun 7, 2018
4cdee4a
Merge branch 'semisup-smbr' of github.com:vimalmanohar/kaldi into sem…
vimalmanohar Jun 8, 2018
cac2df4
More robust utt2dur.sh
Jun 8, 2018
316a1c8
Check if entire wavefile should be read
Jun 8, 2018
8227185
Merge utt2dur changes
vimalmanohar Jun 8, 2018
fe76cbf
Merge branch 'semisup-smbr' of github.com:vimalmanohar/kaldi into sem…
vimalmanohar Jun 8, 2018
fe41373
Merge branch 'master' of github.com:kaldi-asr/kaldi into semisup-smbr
vimalmanohar Jun 11, 2018
d649528
Add combine_queue_opt
vimalmanohar Jun 24, 2018
ec21ebc
combine_queue_opt in train.py
vimalmanohar Jun 24, 2018
014a54d
Moving functions and cleaning up
vimalmanohar Jun 27, 2018
1a77f84
Fixes to aspire related scripts
vimalmanohar Jun 27, 2018
dc69be0
numerator post in get_egs_split.sh
vimalmanohar Jun 27, 2018
0bc2b64
Adding train_queue_opt
vimalmanohar Jun 27, 2018
0b34eb2
Adding semisup ts learning
vimalmanohar Jun 28, 2018
9c87d0d
aspire: Adding aspire clean and ts recipes
vimalmanohar Jun 28, 2018
11914db
Merging master
vimalmanohar Jul 22, 2018
524ae09
Some aspire recipes
vimalmanohar Aug 1, 2018
5295d46
Merging origin
vimalmanohar Aug 1, 2018
865ae0f
Change the way chain egs normalization is done and sMBR supervision
vimalmanohar Aug 20, 2018
029eeed
Bug fix in combination objective
vimalmanohar Aug 22, 2018
d0de661
Minor fix
vimalmanohar Aug 22, 2018
59aa912
Minor fix
vimalmanohar Aug 22, 2018
06cf176
Updating babel recipe
vimalmanohar Aug 24, 2018
059eb67
semisup:Compose den lat before splitting
vimalmanohar Aug 24, 2018
97ffada
Update basic_layers.py
vimalmanohar Apr 23, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
106 changes: 27 additions & 79 deletions egs/ami/s5b/local/nnet3/multi_condition/run_ivector_common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,30 +10,25 @@ set -e -o pipefail
stage=1
mic=ihm
nj=30
min_seg_len=1.55 # min length in seconds... we do this because chain training
# will discard segments shorter than 1.5 seconds. Must remain in sync with
# the same option given to prepare_lores_feats.sh.
train_set=train_cleaned # you might set this to e.g. train_cleaned.
gmm=tri3_cleaned # This specifies a GMM-dir from the features of the type you're training the system on;
# it should contain alignments for 'train_set'.

norvb_datadir=data/ihm/train_cleaned_sp

num_threads_ubm=32
rvb_affix=_rvb
nnet3_affix=_cleaned # affix for exp/$mic/nnet3 directory to put iVector stuff in, so it
# becomes exp/$mic/nnet3_cleaned or whatever.
num_data_reps=1
sample_rate=16000

max_jobs_run=10

. ./cmd.sh
. ./path.sh
. ./utils/parse_options.sh

nnet3_affix=${nnet3_affix}$rvb_affix

gmmdir=exp/${mic}/${gmm}


for f in data/${mic}/${train_set}/feats.scp ${gmmdir}/final.mdl; do
for f in data/${mic}/${train_set}/feats.scp; do
if [ ! -f $f ]; then
echo "$0: expected file $f to exist"
exit 1
Expand Down Expand Up @@ -73,35 +68,22 @@ if [ $stage -le 1 ]; then

for datadir in ${train_set}_sp dev eval; do
steps/make_mfcc.sh --nj $nj --mfcc-config conf/mfcc_hires.conf \
--cmd "$train_cmd" data/$mic/${datadir}_hires
--cmd "$train_cmd --max-jobs-run $max_jobs_run" data/$mic/${datadir}_hires
steps/compute_cmvn_stats.sh data/$mic/${datadir}_hires
utils/fix_data_dir.sh data/$mic/${datadir}_hires
done
fi

if [ $stage -le 2 ]; then
echo "$0: combining short segments of speed-perturbed high-resolution MFCC training data"
# we have to combine short segments or we won't be able to train chain models
# on those segments.
utils/data/combine_short_segments.sh \
data/${mic}/${train_set}_sp_hires $min_seg_len data/${mic}/${train_set}_sp_hires_comb

# just copy over the CMVN to avoid having to recompute it.
cp data/${mic}/${train_set}_sp_hires/cmvn.scp data/${mic}/${train_set}_sp_hires_comb/
utils/fix_data_dir.sh data/${mic}/${train_set}_sp_hires_comb/
fi

if [ $stage -le 3 ]; then
echo "$0: creating reverberated MFCC features"

datadir=data/ihm/train_cleaned_sp

mfccdir=${datadir}_rvb${num_data_reps}_hires/data
mfccdir=${norvb_datadir}_rvb${num_data_reps}_hires/data
if [[ $(hostname -f) == *.clsp.jhu.edu ]] && [ ! -d $mfccdir/storage ]; then
utils/create_split_dir.pl /export/b0{5,6,7,8}/$USER/kaldi-data/egs/ami-$mic-$(date +'%m_%d_%H_%M')/s5/$mfccdir/storage $mfccdir/storage
fi

if [ ! -f ${datadir}_rvb${num_data_reps}_hires/feats.scp ]; then
if [ ! -f ${norvb_datadir}_rvb${num_data_reps}_hires/feats.scp ]; then
if [ ! -d "RIRS_NOISES" ]; then
# Download the package that includes the real RIRs, simulated RIRs, isotropic noises and point-source noises
wget --no-check-certificate http://www.openslr.org/resources/28/rirs_noises.zip
Expand All @@ -123,60 +105,27 @@ if [ $stage -le 3 ]; then
--isotropic-noise-addition-probability 1 \
--num-replications ${num_data_reps} \
--max-noises-per-minute 1 \
--source-sampling-rate 16000 \
${datadir} ${datadir}_rvb${num_data_reps}
--source-sampling-rate $sample_rate \
${norvb_datadir} ${norvb_datadir}_rvb${num_data_reps}

utils/copy_data_dir.sh ${datadir}_rvb${num_data_reps} ${datadir}_rvb${num_data_reps}_hires
utils/data/perturb_data_dir_volume.sh ${datadir}_rvb${num_data_reps}_hires
utils/copy_data_dir.sh ${norvb_datadir}_rvb${num_data_reps} ${norvb_datadir}_rvb${num_data_reps}_hires
utils/data/perturb_data_dir_volume.sh ${norvb_datadir}_rvb${num_data_reps}_hires

steps/make_mfcc.sh --nj $nj --mfcc-config conf/mfcc_hires.conf \
--cmd "$train_cmd" ${datadir}_rvb${num_data_reps}_hires
steps/compute_cmvn_stats.sh ${datadir}_rvb${num_data_reps}_hires
utils/fix_data_dir.sh ${datadir}_rvb${num_data_reps}_hires

utils/data/combine_short_segments.sh \
${datadir}_rvb${num_data_reps}_hires $min_seg_len ${datadir}_rvb${num_data_reps}_hires_comb

# just copy over the CMVN to avoid having to recompute it.
cp ${datadir}_rvb${num_data_reps}_hires/cmvn.scp ${datadir}_rvb${num_data_reps}_hires_comb/
utils/fix_data_dir.sh ${datadir}_rvb${num_data_reps}_hires_comb/
--cmd "$train_cmd --max-jobs-run $max_jobs_run" ${norvb_datadir}_rvb${num_data_reps}_hires
steps/compute_cmvn_stats.sh ${norvb_datadir}_rvb${num_data_reps}_hires
utils/fix_data_dir.sh ${norvb_datadir}_rvb${num_data_reps}_hires
fi

utils/combine_data.sh data/${mic}/${train_set}_sp_rvb_hires data/${mic}/${train_set}_sp_hires ${datadir}_rvb${num_data_reps}_hires
utils/combine_data.sh data/${mic}/${train_set}_sp_rvb_hires_comb data/${mic}/${train_set}_sp_hires_comb ${datadir}_rvb${num_data_reps}_hires_comb
utils/combine_data.sh data/${mic}/${train_set}_sp_rvb_hires data/${mic}/${train_set}_sp_hires ${norvb_datadir}_rvb${num_data_reps}_hires
fi


if [ $stage -le 4 ]; then
echo "$0: selecting segments of hires training data that were also present in the"
echo " ... original training data."

# note, these data-dirs are temporary; we put them in a sub-directory
# of the place where we'll make the alignments.
temp_data_root=exp/$mic/nnet3${nnet3_affix}/tri5
mkdir -p $temp_data_root

utils/data/subset_data_dir.sh --utt-list data/${mic}/${train_set}/feats.scp \
data/${mic}/${train_set}_sp_hires $temp_data_root/${train_set}_hires

# note: essentially all the original segments should be in the hires data.
n1=$(wc -l <data/${mic}/${train_set}/feats.scp)
n2=$(wc -l <$temp_data_root/${train_set}_hires/feats.scp)
if [ $n1 != $n1 ]; then
echo "$0: warning: number of feats $n1 != $n2, if these are very different it could be bad."
fi

echo "$0: training a system on the hires data for its LDA+MLLT transform, in order to produce the diagonal GMM."
if [ -e exp/$mic/nnet3${nnet3_affix}/tri5/final.mdl ]; then
# we don't want to overwrite old stuff, ask the user to delete it.
echo "$0: exp/$mic/nnet3${nnet3_affix}/tri5/final.mdl already exists: "
echo " ... please delete and then rerun, or use a later --stage option."
exit 1;
fi
steps/train_lda_mllt.sh --cmd "$train_cmd" --num-iters 7 --mllt-iters "2 4 6" \
--splice-opts "--left-context=3 --right-context=3" \
3000 10000 $temp_data_root/${train_set}_hires data/lang \
$gmmdir exp/$mic/nnet3${nnet3_affix}/tri5
steps/online/nnet2/get_pca_transform.sh --cmd "$train_cmd" \
--splice-opts "--left-context=3 --right-context=3" \
--max-utts 30000 --subsample 2 \
data/${mic}/${train_set}_sp_rvb_hires \
exp/$mic/nnet3${nnet3_affix}/pca_transform
fi


Expand All @@ -186,9 +135,8 @@ if [ $stage -le 5 ]; then
mkdir -p exp/$mic/nnet3${nnet3_affix}/diag_ubm
temp_data_root=exp/$mic/nnet3${nnet3_affix}/diag_ubm

# train a diagonal UBM using a subset of about a quarter of the data
# we don't use the _comb data for this as there is no need for compatibility with
# the alignments, and using the non-combined data is more efficient for I/O
# train a diagonal UBM using a subset of about a quarter of the data,
# and using the non-combined data is more efficient for I/O
# (no messing about with piped commands).
num_utts_total=$(wc -l <data/$mic/${train_set}_sp_rvb_hires/utt2spk)
num_utts=$[$num_utts_total/4]
Expand All @@ -201,7 +149,7 @@ if [ $stage -le 5 ]; then
--num-frames 700000 \
--num-threads $num_threads_ubm \
${temp_data_root}/${train_set}_sp_rvb_hires_subset 512 \
exp/$mic/nnet3${nnet3_affix}/tri5 exp/$mic/nnet3${nnet3_affix}/diag_ubm
exp/$mic/nnet3${nnet3_affix}/pca_transform exp/$mic/nnet3${nnet3_affix}/diag_ubm
fi

if [ $stage -le 6 ]; then
Expand All @@ -217,7 +165,7 @@ if [ $stage -le 7 ]; then
# note, we don't encode the 'max2' in the name of the ivectordir even though
# that's the data we extract the ivectors from, as it's still going to be
# valid for the non-'max2' data, the utterance list is the same.
ivectordir=exp/$mic/nnet3${nnet3_affix}/ivectors_${train_set}_sp_rvb_hires_comb
ivectordir=exp/$mic/nnet3${nnet3_affix}/ivectors_${train_set}_sp_rvb_hires
if [[ $(hostname -f) == *.clsp.jhu.edu ]] && [ ! -d $ivectordir/storage ]; then
utils/create_split_dir.pl /export/b0{5,6,7,8}/$USER/kaldi-data/egs/ami-$mic-$(date +'%m_%d_%H_%M')/s5/$ivectordir/storage $ivectordir/storage
fi
Expand All @@ -231,10 +179,10 @@ if [ $stage -le 7 ]; then
# handle per-utterance decoding well (iVector starts at zero).
temp_data_root=${ivectordir}
utils/data/modify_speaker_info.sh --utts-per-spk-max 2 \
data/${mic}/${train_set}_sp_rvb_hires_comb ${temp_data_root}/${train_set}_sp_rvb_hires_comb_max2
data/${mic}/${train_set}_sp_rvb_hires ${temp_data_root}/${train_set}_sp_rvb_hires_max2

steps/online/nnet2/extract_ivectors_online.sh --cmd "$train_cmd" --nj $nj \
${temp_data_root}/${train_set}_sp_rvb_hires_comb_max2 \
${temp_data_root}/${train_set}_sp_rvb_hires_max2 \
exp/$mic/nnet3${nnet3_affix}/extractor $ivectordir

# Also extract iVectors for the test data, but in this case we don't need the speed
Expand Down
15 changes: 0 additions & 15 deletions egs/ami/s5b/local/nnet3/prepare_lores_feats.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,6 @@ set -e -o pipefail
stage=0
mic=ihm
nj=30
min_seg_len=1.55 # min length in seconds... we do this because chain training
# will discard segments shorter than 1.5 seconds. Must remain in
# sync with the same option given to run_ivector_common.sh.
use_ihm_ali=false # If true, we use alignments from the IHM data (which is better..
# don't set this to true if $mic is set to ihm.)
train_set=train # you might set this to e.g. train_cleaned.
Expand Down Expand Up @@ -69,16 +66,4 @@ if [ $stage -le 9 ]; then
utils/fix_data_dir.sh data/${mic}/${train_set}${ihm_suffix}_sp
fi

if [ $stage -le 10 ]; then
echo "$0: combining short segments of 13-dimensional speed-perturbed ${maybe_ihm}MFCC data"
src=data/${mic}/${train_set}${ihm_suffix}_sp
dest=data/${mic}/${train_set}${ihm_suffix}_sp_comb
utils/data/combine_short_segments.sh $src $min_seg_len $dest
# re-use the CMVN stats from the source directory, since it seems to be slow to
# re-compute them after concatenating short segments.
cp $src/cmvn.scp $dest/
utils/fix_data_dir.sh $dest
fi


exit 0;
2 changes: 1 addition & 1 deletion egs/ami/s5b/local/prepare_parallel_train_data.sh
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,6 @@ utils/apply_map.pl -f 1 $tmpdir/ihmutt2utt <data/ihm/train/segments >data/$mic/t

utils/fix_data_dir.sh data/$mic/train_ihmdata

rm $tmpdir/ihmutt2utt
#rm $tmpdir/ihmutt2utt

exit 0;
2 changes: 2 additions & 0 deletions egs/ami/s5b/path.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,5 @@ BEAMFORMIT=$KALDI_ROOT/tools/BeamformIt

export PATH=$PATH:$LMBIN:$BEAMFORMIT:$SRILM

. /etc/profile.d/modules.sh
module load shared cuda80/toolkit
3 changes: 3 additions & 0 deletions egs/ami/s5b/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ case $(hostname -d) in
fit.vutbr.cz) AMI_DIR=/mnt/matylda5/iveselyk/KALDI_AMI_WAV ;; # BUT,
clsp.jhu.edu) AMI_DIR=/export/corpora4/ami/amicorpus ;; # JHU,
cstr.ed.ac.uk) AMI_DIR= ;; # Edinburgh,
cm.gemini) AMI_DIR=/export/common/data/corpora/amicorpus;; # COE
esac

[ ! -r data/local/lm/final_lm ] && echo "Please, run 'run_prepare_shared.sh' first!" && exit 1
Expand Down Expand Up @@ -163,6 +164,8 @@ if [ $stage -le 10 ]; then
local/run_cleanup_segmentation.sh --mic $mic
fi

exit 0

if [ $stage -le 11 ]; then
ali_opt=
[ "$mic" != "ihm" ] && ali_opt="--use-ihm-ali true"
Expand Down
1 change: 1 addition & 0 deletions egs/ami/s5b/run_prepare_shared.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ case $(hostname -d) in
fit.vutbr.cz) FISHER_TRANS=/mnt/matylda2/data/FISHER/fe_03_p1_tran ;; # BUT,
clsp.jhu.edu) FISHER_TRANS=/export/corpora4/ami/fisher_trans/part1 ;; # JHU,
cstr.ed.ac.uk) FISHER_TRANS=`pwd`/eddie_data/lm/data/fisher/part1 ;; # Edinburgh,
cm.gemini) FISHER_TRANS=/export/common/data/corpora/LDC/LDC2004T19_CLSP_format/fe_03_p1_tran/;; # COE
*) echo "Please modify the script to add your loaction of the Fisher transcripts, or modify this script."; exit 1;;
esac
# Or select manually,
Expand Down
1 change: 1 addition & 0 deletions egs/aspire/s5/conf/mfcc_hires.conf
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@
--num-ceps=40 # there is no dimensionality reduction.
--low-freq=40 # low cutoff frequency for mel bins
--high-freq=-200 # high cutoff frequently, relative to Nyquist of 4000 (=3800)
--allow-downsample=true
21 changes: 11 additions & 10 deletions egs/aspire/s5/local/chain/tuning/run_tdnn_lstm_1a.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ test_stage=1
nj=70

tdnn_affix=_1a
chain_affix=

hidden_dim=1024
cell_dim=1024
Expand Down Expand Up @@ -62,9 +63,9 @@ fi
train_set=train_rvb

gmm_dir=exp/tri5a # used to get training lattices (for chain supervision)
treedir=exp/chain/tree_bi_a
lat_dir=exp/chain/tri5a_${train_set}_lats # training lattices directory
dir=exp/chain/tdnn_lstm${tdnn_affix}
treedir=exp/chain${chain_affix}/tree_bi_a
lat_dir=exp/chain${chain_affix}/tri5a_${train_set}_lats # training lattices directory
dir=exp/chain${chain_affix}/tdnn_lstm${tdnn_affix}
train_data_dir=data/${train_set}_hires
train_ivector_dir=exp/nnet3/ivectors_${train_set}
lang=data/lang_chain
Expand All @@ -77,7 +78,7 @@ local/nnet3/run_ivector_common.sh --stage $stage --num-data-reps 3 || exit 1

mkdir -p $dir

norvb_lat_dir=exp/chain/tri5a_train_lats
norvb_lat_dir=exp/chain${chain_affix}/tri5a_train_lats

if [ $stage -le 7 ]; then
# Get the alignments as lattices (gives the chain training more freedom).
Expand Down Expand Up @@ -257,10 +258,10 @@ if [ $stage -le 15 ]; then

for d in dev_rvb test_rvb; do
(
if [ ! -f exp/nnet3/ivectors_${d}/ivector_online.scp ]; then
if [ ! -f exp/nnet3${nnet3_affix}/ivectors_${d}/ivector_online.scp ]; then
steps/online/nnet2/extract_ivectors_online.sh --cmd "$train_cmd" --nj 30 \
data/${d}_hires exp/nnet3/extractor \
exp/nnet3/ivectors_${d} || { echo "Failed i-vector extraction for data/${d}_hires"; touch $dir/.error; }
data/${d}_hires exp/nnet3${nnet3_affix}/extractor \
exp/nnet3${nnet3_affix}/ivectors_${d} || { echo "Failed i-vector extraction for data/${d}_hires"; touch $dir/.error; }
fi

decode_dir=$dir/decode_${d}_pp
Expand All @@ -270,7 +271,7 @@ if [ $stage -le 15 ]; then
--extra-right-context $extra_right_context \
--extra-left-context-initial 0 --extra-right-context-final 0 \
--frames-per-chunk 160 \
--online-ivector-dir exp/nnet3/ivectors_${d} \
--online-ivector-dir exp/nnet3${nnet3_affix}/ivectors_${d} \
$graph_dir data/${d}_hires $decode_dir || { echo "Failed decoding in $decode_dir"; touch $dir/.error; }
) &
done
Expand All @@ -292,7 +293,7 @@ if [ $stage -le 16 ]; then
--extra-left-context-initial 0 --extra-right-context-final 0 \
--sub-speaker-frames 6000 --max-count 75 --ivector-scale 0.75 \
--pass2-decode-opts "--min-active 1000" \
dev_aspire data/lang $dir/graph_pp $dir
dev_aspire_ldc data/lang $dir/graph_pp $dir
fi

if [ $stage -le 17 ]; then
Expand All @@ -305,7 +306,7 @@ if [ $stage -le 17 ]; then
--extra-left-context-initial 0 \
--max-count 75 \
--pass2-decode-opts "--min-active 1000" \
dev_aspire data/lang $dir/graph_pp $dir
dev_aspire_ldc data/lang $dir/graph_pp $dir
fi

exit 0;
Expand Down
Loading