-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: add TDNNF to pytorch. #3892
Conversation
Cool! |
I find kaldi invokes kaldi/src/nnet3/nnet-training.cc Lines 120 to 122 in 5882dc5
right after the update of parameters. I am going to invoke it after calling |
Yeah that makes sense.
…On Thu, Jan 30, 2020 at 1:42 PM Fangjun Kuang ***@***.***> wrote:
I find kaldi invokes
https://github.com/kaldi-asr/kaldi/blob/5882dc51724b25d41799cad16a7c06c52a259503/src/nnet3/nnet-training.cc#L120-L122
right after the update of parameters.
I am going to invoke it after calling optimizer.step()
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3892?email_source=notifications&email_token=AAZFLO6LR6GGGM4VN6RM7DDRAJSD7A5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKJYFCY#issuecomment-580092555>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO7JCND7T5TQKFPH2J3RAJSD7ANCNFSM4KNO5YQA>
.
|
9508860
to
a560d0d
Compare
a560d0d
to
154e366
Compare
decoding result for TDNNF is as follows:
The first column is from The second column is the result from this pullrequest. The third column comes from #3868 The second column has greater number of layers and larger hidden dim than the first column. The second column has almost the same topology as the third column. The differences are
I am not sure whether the above differences cause inferior results for PyTorch. Another difference is the alignment information:
|
I have removed pitch and am running the training again. Now the feature part Regarding the [-1, 0, 1], TDNN networks use As for the weight matrix
In the PyTorch implementation, we use The above paper also proposes |
after removing pitch, the result becomes a little worse
The above table is copied here for better comparison
|
@csukuangfj -- did you look at the likelihoods? I wonder if it is overtraining or undertraining? |
@jtrmal Part of the training log is as follows:
The screenshot of the tensorboard is How can you tell whether it is underfitting or overfitting from the objective function value? |
well, we compute statistics from a small held-out subset the training data
(in the original kaldi training). Those are the 'valid' logs (iirc). I was
wondering if something would be visible using those...
y,
…On Fri, Jan 31, 2020 at 10:59 AM Fangjun Kuang ***@***.***> wrote:
@jtrmal <https://github.com/jtrmal>
Did you mean the objective function value?
Part of the training log is as follows:
2020-01-31 16:17:23,226 INFO [train.py:185] epoch 0, learning rate 0.001
2020-01-31 16:17:23,536 INFO [train.py:102] Process 0/3161(0.000000%) global average objf: -1.195890 over 6400.0 frames, current batch average objf: -1.195890 over 6400 frames, epoch 0
2020-01-31 16:17:44,072 INFO [train.py:102] Process 100/3161(3.163556%) global average objf: -0.687263 over 573696.0 frames, current batch average objf: -0.457086 over 6400 frames, epoch 0
2020-01-31 16:18:04,479 INFO [train.py:102] Process 200/3161(6.327112%) global average objf: -0.535040 over 1138432.0 frames, current batch average objf: -0.338968 over 6400 frames, epoch 0
2020-01-31 16:18:24,999 INFO [train.py:102] Process 300/3161(9.490668%) global average objf: -0.453431 over 1704064.0 frames, current batch average objf: -0.261345 over 6400 frames, epoch 0
2020-01-31 16:18:45,192 INFO [train.py:102] Process 400/3161(12.654223%) global average objf: -0.402034 over 2267136.0 frames, current batch average objf: -0.242083 over 6400 frames, epoch 0
....
2020-01-31 17:21:32,249 INFO [train.py:102] Process 2800/3161(88.579563%) global average objf: -0.060549 over 15840896.0 frames, current batch average objf: -0.064717 over 3840 frames, epoch 5
2020-01-31 17:21:53,120 INFO [train.py:102] Process 2900/3161(91.743119%) global average objf: -0.060385 over 16406528.0 frames, current batch average objf: -0.066644 over 3840 frames, epoch 5
2020-01-31 17:22:14,151 INFO [train.py:102] Process 3000/3161(94.906675%) global average objf: -0.060270 over 16973824.0 frames, current batch average objf: -0.047593 over 6400 frames, epoch 5
2020-01-31 17:22:34,801 INFO [train.py:102] Process 3100/3161(98.070231%) global average objf: -0.060135 over 17539456.0 frames, current batch average objf: -0.050985 over 6400 frames, epoch 5
The screenshot of the tensorboard is
[image: Screen Shot 2020-01-31 at 17 55 07]
<https://user-images.githubusercontent.com/5284924/73530317-3592aa00-4453-11ea-9052-22026343cd0a.png>
How can you tell whether it is underfitting or overfitting from the
objective function value?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3892?email_source=notifications&email_token=ACUKYX5EQBNRYFGQPYGKNJ3RAPY6XA5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKOEL7I#issuecomment-580666877>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACUKYX3FEOF7DT5YBGYXVUTRAPY6XANCNFSM4KNO5YQA>
.
|
Could the difference by explained by the different optimizers (Adam v NSGD) ? |
Let's keep the features the same for now while we work out the other differences. |
I agree. Great progress nonetheless!.
y.
…On Fri, Jan 31, 2020 at 11:33 AM Daniel Povey ***@***.***> wrote:
Let's keep the features the same for now while we work out the other
differences.
There are likely quite a few differences and I want to add more
diagnostics to the PyTorch setup to help track it down in more detail.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3892?email_source=notifications&email_token=ACUKYX3ND2Q4SBXCD6QRQHDRAP5A5A5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKOHL6Y#issuecomment-580679163>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACUKYX3LW27XXKKO6LTVEVDRAP5A5ANCNFSM4KNO5YQA>
.
|
Run num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}') learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python) affine_opts="l2-regularize=0.008" tdnnf_opts="l2-regularize=0.008 bypass-scale=0.66" linear_opts="l2-regularize=0.008 orthonormal-constraint=-1.0" prefinal_opts="l2-regularize=0.008" output_opts="l2-regularize=0.002" input dim=40 name=input fixed-affine-layer name=lda input=Append(-1,0,1) affine-transform-file=$dir/configs/lda.mat relu-batchnorm-layer name=tdnn1 $affine_opts dim=1024 tdnnf-layer name=tdnnf2 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=1 tdnnf-layer name=tdnnf3 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=1 tdnnf-layer name=tdnnf4 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=1 tdnnf-layer name=tdnnf5 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=0 tdnnf-layer name=tdnnf6 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3 tdnnf-layer name=tdnnf7 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3 tdnnf-layer name=tdnnf8 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3 tdnnf-layer name=tdnnf9 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3 tdnnf-layer name=tdnnf10 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3 tdnnf-layer name=tdnnf11 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3 tdnnf-layer name=tdnnf12 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3 tdnnf-layer name=tdnnf13 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3 linear-component name=prefinal-l dim=256 $linear_opts prefinal-layer name=prefinal-chain input=prefinal-l $prefinal_opts big-dim=1024 small-dim=256 output-layer name=output include-log-softmax=false dim=$num_targets $output_opts prefinal-layer name=prefinal-xent input=prefinal-l $prefinal_opts big-dim=1024 small-dim=256 output-layer name=output-xent dim=$num_targets learning-rate-factor=$learning_rate_factor $output_opts Result
All result until nowTDNN
Both of them hold same config with
TDNN-F
They hold the same
It seems that |
@qindazhu thanks. |
NO, it will not. if you leave parameter ref.config of tdnn_1c
ref.config of tdnn_1c_r_d
|
I see. |
Cool, so it looks like dropout was not making a difference in this
situation.
…On Sat, Feb 1, 2020 at 7:06 PM Fangjun Kuang ***@***.***> wrote:
I see.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3892?email_source=notifications&email_token=AAZFLO555H7LOOZYBDJW22DRAVJSLA5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKQ2LAQ#issuecomment-581019010>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLOYRXLDXQG72NMP4A7TRAVJSLANCNFSM4KNO5YQA>
.
|
@qindazhu |
By the way, PyTorch is significantly faster than kaldi. It took about 1 hour in total for 6 epochs in the current pullrequest. @fanlu reported in this pullrequest that kaldi took about 4 hours in total for 6 epochs. |
It might require script level changes to turn off natural gradient. Let me
figure out how to do that, I can do it within a couple hours.
…On Tue, Feb 4, 2020 at 1:43 PM Fangjun Kuang ***@***.***> wrote:
By the way, PyTorch is significantly faster than kaldi.
It took about 1 hour in total for 6 epochs in the current pullrequest.
@fanlu <https://github.com/fanlu> reported in this pullrequest
<#3868> that kaldi took about 4
hours in total for 6 epochs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3892?email_source=notifications&email_token=AAZFLO73QGAY2HZU7SFVHSDRBD56LA5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKWNMQA#issuecomment-581752384>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO4ZQEBD5MCKLX7NRODRBD56LANCNFSM4KNO5YQA>
.
|
thanks a lot. |
Haowen, I'll give you some general directions...
you should probably do the changes to the natural-gradient stuff and the
max-change separately, at least at first; if you do both,
with the learning rates as they are, it will likely diverge.
Also there are two max-change values: one is in the individual layers,
passed through to their components (which you could set to -1 in the
xconfig (`max-change=-1`) to disable), and one is a global one that's
passed into the binary via the python script, probably via --max-change=2.0
to train.py, but run with no args to check. Probably setting to -1 will
disable that as well. (It may diverse unless lrates are decreased).
To disable natural gradient will require script changes in
steps/libs/nnet3/xconfig/composite_layers.py, to XconfigTdnnfLayer,
XconfigFinalLayer and XconfigPrefinalLayer. Basically, any instances of
NaturalGradientAffineComponent should be change to AffineComponent, and to
any layers of type LinearComponent or TdnnComponent, the config
`use-natural-gradient=false` should be added to their config lines. For
the relu-batchnorm-layer or relu-batchnorm-dropout layer, you can disable
natural gradient by adding on the xconfig line:
ng-affine-options=alpha=1000000
I know that looks odd... it is a string with value alpha=1000000 (I dont
remember if it's single or double quotes to quote a string, but they are
not necessary since there are no spaces, I think).
After starting training you should search in the progress logs for `alpha`
just to identify all potentially natural-gradient components, to verify
that either those components have huge alpha or they have
use-natural-gradient=false.
…On Tue, Feb 4, 2020 at 1:48 PM Fangjun Kuang ***@***.***> wrote:
thanks a lot.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3892?email_source=notifications&email_token=AAZFLO77O5Q3EKNRZATGTPLRBD6RBA5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKWNU4A#issuecomment-581753456>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO24RVWPC3DIZVP3N4DRBD6RBANCNFSM4KNO5YQA>
.
|
Sure, I will draw the L2 norm of all the weight matrices with tensorboard. |
I have changed the model structure and forward function, And the result is
But It's slower than before, It's take about 4 hours 20minutes.
|
Cool! So getting closer. The l2 norm of the parameter matrices, compared with Kaldi's, may tell us what's going on with the optimization and help tune learning rates etc. |
Hi, @csukuangfj
I have got an error when criterion called
and the error msg is below:
should we specify fixed device_id in this function? |
1. I would use
devcice =. ...
model = model.to(device)
kaldi.SelectGpuDeviceId(get the id from the above device)
Do not use model. cuda()
2. I do not have much experience in DataParallel. I think the model forward() can be executed by DataParallel. The loss function has to be executed on the master Gpu, which is usually the one with id==0, and you have to set kaldi to use the same device id as the master Gpu.
3. I would recommend you using ddp, that is, DistributedDataParallel, where you can run model forward () and loss computation in parallel.
Sorry I'm busy today.
Sent from myMail for iOS
Monday, 10 February 2020, 22:13 +0800 from [email protected] <[email protected]>:
…Hi, @csukuangfj
When I use DataParallel to training tdnnf model on multi gpu
model = torch.nn.DataParallel(model.cuda(), device_ids=list(range(args.ngpu)))
I have got an error when criterion called
nnet_output = kaldi.PytorchToCuSubMatrix(to_dlpack(nnet_output_tensor))
and the error msg is below:
ASSERTION_FAILED ([5.5.717~1-e05890d]:ConsumeDLManagedTensor():dlpack/dlpack_pybind.cc:129) Assertion failed: (ctx->device_id == device_id)
should we specify fixed device_id in this function?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub , or unsubscribe .
|
You have to set the gpu id of kaldi at the very beginning of the program.
No need to set it again inside the loss function.
Sent from myMail for iOS
Monday 10 February 2020 22:13 +0800 from [email protected] <[email protected]>:
…Hi, @csukuangfj
When I use DataParallel to training tdnnf model on multi gpu
model = torch.nn.DataParallel(model.cuda(), device_ids=list(range(args.ngpu)))
I have got an error when criterion called
nnet_output = kaldi.PytorchToCuSubMatrix(to_dlpack(nnet_output_tensor))
and the error msg is below:
ASSERTION_FAILED ([5.5.717~1-e05890d]:ConsumeDLManagedTensor():dlpack/dlpack_pybind.cc:129) Assertion failed: (ctx->device_id == device_id)
should we specify fixed device_id in this function?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub , or unsubscribe .
|
Ok, I'll try
|
Regarding the weight decay: I would advise to just tune those separately. The constants are defined in quite different ways, and wouldn't even be comparable between adam and SGD, probably. |
I will do it this week.
Sent from myMail for iOS
Tuesday, 11 February 2020, 11:52 +0800 from [email protected] <[email protected]>:
…>I have drawed the distribution and histogram of parameters.
>eg:
>which layer should I focus on ? And Is there a tool to get l2 norm of kaldi's parameter?
.. Regarding the weights: I don't really understand those plots, but I just wanted the 2-norm, which would be torch.sqrt((some_tensor ** 2).sum()), for each parameter. You might have to write a little code to get it.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub , or unsubscribe .
|
Maybe this is a simple way: use kaldi tool: "nnet3-copy --binary=false final.mdl" to convert the mdl file to the text mode and then write a script a compute the 2-norm of weights. |
oh sorry for Kaldi's model.. it will be printed in the progress.N.log,
search for Norm
…On Tue, Feb 11, 2020 at 2:22 PM 付嘉懿 ***@***.***> wrote:
which layer should I focus on ? And Is there a tool to get l2 norm of
kaldi's parameter?
Maybe this is a simple way: use kaldi tool: "nnet-am-copy --binary=false
final.mdl" to convert the mdl file to the text mode and then write a script
a compute the 2-norm of weights.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3892?email_source=notifications&email_token=AAZFLO5VNIK7WCBUT6DKGX3RCI72XA5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELLKSHA#issuecomment-584493340>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO2X4YKGIPDIS7REQFDRCI72XANCNFSM4KNO5YQA>
.
|
I have used https://github.com/XiaoMi/kaldi-onnx.git and
and this is the l2-norm of pytorch's model parameter
|
Thanks.. what I really wanted to compare the l2 norms of the Kaldi model's
parameters with the corresponding parameters of the PyTorch model.
…On Tue, Feb 11, 2020 at 3:07 PM fanlu ***@***.***> wrote:
I have used https://github.com/XiaoMi/kaldi-onnx.git <http://url> and
torch.norm np.linalg.norm to calculate the l2_norm
@danpovey <https://github.com/danpovey> please have a look and point me
sth what you wanted next. thanks
This is the l2_norm log of kaldi tdnn_1c
tdnn_1c l2-norm
2020-02-11 14:44:31,355 __main__ INFO {'dim': '40', 'name': 'input', 'node_type': 'input-node', 'type': 'Input', 'id': 1}
2020-02-11 14:44:31,356 __main__ INFO {'id': 4, 'type': 'Splice', 'name': 'splice_4', 'input': ['input'], 'context': [-1, 0, 1]}
2020-02-11 14:44:31,372 __main__ INFO {'input': ['splice_4'], 'component': 'lda', 'name': 'lda', 'node_type': 'component-node', 'id': 5, 'params': (120, 120), 'bias': (120,), 'type': 'Gemm', 'raw-type': 'FixedAffine', 'params-l2-norm': 0.3751292, 'bias-l2-norm': 0.02903783}
2020-02-11 14:44:31,375 __main__ INFO {'input': ['lda'], 'component': 'tdnn1.affine', 'name': 'tdnn1.affine', 'node_type': 'component-node', 'id': 6, 'max_change': 0.75, 'params': (1024, 120), 'bias': (1024,), 'rank_in': 20, 'rank_out': 80, 'num_samples_history': 2000.0, 'alpha': 4.0, 'type': 'Gemm', 'raw-type': 'NaturalGradientAffine', 'params-l2-norm': 17.994331, 'bias-l2-norm': 2.1129487}
2020-02-11 14:44:31,375 __main__ INFO {'input': ['tdnn1.affine'], 'component': 'tdnn1.relu', 'name': 'tdnn1.relu', 'node_type': 'component-node', 'id': 7, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 71424.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 1.8933748, 'deriv_avg-l2-norm': 17.662828, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,375 __main__ INFO {'input': ['tdnn1.relu'], 'component': 'tdnn1.batchnorm', 'name': 'tdnn1.batchnorm', 'node_type': 'component-node', 'id': 8, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 179712.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 1.9120138, 'stats_var-l2-norm': 0.18313415}
2020-02-11 14:44:31,376 __main__ INFO {'input': ['tdnn1.batchnorm'], 'component': 'tdnn1.dropout', 'name': 'tdnn1.dropout', 'node_type': 'component-node', 'id': 9, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,376 __main__ INFO {'input': ['tdnn1.dropout'], 'component': 'tdnnf2.linear', 'name': 'tdnnf2.linear', 'node_type': 'component-node', 'id': 10, 'time_offsets': array([-1, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 14.65368}
2020-02-11 14:44:31,379 __main__ INFO {'input': ['tdnnf2.linear'], 'component': 'tdnnf2.affine', 'name': 'tdnnf2.affine', 'node_type': 'component-node', 'id': 11, 'time_offsets': array([0, 1]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 12.9894, 'bias-l2-norm': 2.456446}
2020-02-11 14:44:31,380 __main__ INFO {'input': ['tdnnf2.affine'], 'component': 'tdnnf2.relu', 'name': 'tdnnf2.relu', 'node_type': 'component-node', 'id': 12, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 60928.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 12.335568, 'deriv_avg-l2-norm': 15.625699, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,380 __main__ INFO {'input': ['tdnnf2.relu'], 'component': 'tdnnf2.batchnorm', 'name': 'tdnnf2.batchnorm', 'node_type': 'component-node', 'id': 13, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 177792.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 12.284391, 'stats_var-l2-norm': 12.800878}
2020-02-11 14:44:31,380 __main__ INFO {'input': ['tdnnf2.batchnorm'], 'component': 'tdnnf2.dropout', 'name': 'tdnnf2.dropout', 'node_type': 'component-node', 'id': 14, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,380 __main__ INFO {'id': 15, 'type': 'Scale', 'name': 'tdnn1.dropout.Scale.0.66', 'input': ['tdnn1.dropout'], 'scale': 0.66}
2020-02-11 14:44:31,380 __main__ INFO {'id': 16, 'type': 'Sum', 'name': 'tdnn1.dropout.Scale.0.66.Sum.tdnnf2.dropout', 'input': ['tdnn1.dropout.Scale.0.66', 'tdnnf2.dropout']}
2020-02-11 14:44:31,381 __main__ INFO {'input': ['tdnn1.dropout.Scale.0.66.Sum.tdnnf2.dropout'], 'component': 'tdnnf2.noop', 'name': 'tdnnf2.noop', 'node_type': 'component-node', 'id': 17, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,381 __main__ INFO {'input': ['tdnnf2.noop'], 'component': 'tdnnf3.linear', 'name': 'tdnnf3.linear', 'node_type': 'component-node', 'id': 18, 'time_offsets': array([-1, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 12.562767}
2020-02-11 14:44:31,382 __main__ INFO {'input': ['tdnnf3.linear'], 'component': 'tdnnf3.affine', 'name': 'tdnnf3.affine', 'node_type': 'component-node', 'id': 19, 'time_offsets': array([0, 1]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.749477, 'bias-l2-norm': 1.5780896}
2020-02-11 14:44:31,382 __main__ INFO {'input': ['tdnnf3.affine'], 'component': 'tdnnf3.relu', 'name': 'tdnnf3.relu', 'node_type': 'component-node', 'id': 20, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 105408.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 13.213889, 'deriv_avg-l2-norm': 15.905553, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,383 __main__ INFO {'input': ['tdnnf3.relu'], 'component': 'tdnnf3.batchnorm', 'name': 'tdnnf3.batchnorm', 'node_type': 'component-node', 'id': 21, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 175872.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 13.222022, 'stats_var-l2-norm': 13.43212}
2020-02-11 14:44:31,383 __main__ INFO {'input': ['tdnnf3.batchnorm'], 'component': 'tdnnf3.dropout', 'name': 'tdnnf3.dropout', 'node_type': 'component-node', 'id': 22, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,383 __main__ INFO {'id': 23, 'type': 'Scale', 'name': 'tdnnf2.noop.Scale.0.66', 'input': ['tdnnf2.noop'], 'scale': 0.66}
2020-02-11 14:44:31,383 __main__ INFO {'id': 24, 'type': 'Sum', 'name': 'tdnnf2.noop.Scale.0.66.Sum.tdnnf3.dropout', 'input': ['tdnnf2.noop.Scale.0.66', 'tdnnf3.dropout']}
2020-02-11 14:44:31,384 __main__ INFO {'input': ['tdnnf2.noop.Scale.0.66.Sum.tdnnf3.dropout'], 'component': 'tdnnf3.noop', 'name': 'tdnnf3.noop', 'node_type': 'component-node', 'id': 25, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,384 __main__ INFO {'input': ['tdnnf3.noop'], 'component': 'tdnnf4.linear', 'name': 'tdnnf4.linear', 'node_type': 'component-node', 'id': 26, 'time_offsets': array([-1, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 11.342851}
2020-02-11 14:44:31,385 __main__ INFO {'input': ['tdnnf4.linear'], 'component': 'tdnnf4.affine', 'name': 'tdnnf4.affine', 'node_type': 'component-node', 'id': 27, 'time_offsets': array([0, 1]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 9.7689705, 'bias-l2-norm': 1.2836239}
2020-02-11 14:44:31,385 __main__ INFO {'input': ['tdnnf4.affine'], 'component': 'tdnnf4.relu', 'name': 'tdnnf4.relu', 'node_type': 'component-node', 'id': 28, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 26880.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 12.693966, 'deriv_avg-l2-norm': 15.441769, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,386 __main__ INFO {'input': ['tdnnf4.relu'], 'component': 'tdnnf4.batchnorm', 'name': 'tdnnf4.batchnorm', 'node_type': 'component-node', 'id': 29, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 58624.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 12.727512, 'stats_var-l2-norm': 13.328432}
2020-02-11 14:44:31,386 __main__ INFO {'input': ['tdnnf4.batchnorm'], 'component': 'tdnnf4.dropout', 'name': 'tdnnf4.dropout', 'node_type': 'component-node', 'id': 30, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,386 __main__ INFO {'id': 31, 'type': 'Scale', 'name': 'tdnnf3.noop.Scale.0.66', 'input': ['tdnnf3.noop'], 'scale': 0.66}
2020-02-11 14:44:31,386 __main__ INFO {'id': 32, 'type': 'Sum', 'name': 'tdnnf3.noop.Scale.0.66.Sum.tdnnf4.dropout', 'input': ['tdnnf3.noop.Scale.0.66', 'tdnnf4.dropout']}
2020-02-11 14:44:31,386 __main__ INFO {'input': ['tdnnf3.noop.Scale.0.66.Sum.tdnnf4.dropout'], 'component': 'tdnnf4.noop', 'name': 'tdnnf4.noop', 'node_type': 'component-node', 'id': 33, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,387 __main__ INFO {'input': ['tdnnf4.noop'], 'component': 'tdnnf5.linear', 'name': 'tdnnf5.linear', 'node_type': 'component-node', 'id': 34, 'time_offsets': array([0]), 'params': (128, 1024), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 8.509507}
2020-02-11 14:44:31,387 __main__ INFO {'input': ['tdnnf5.linear'], 'component': 'tdnnf5.affine', 'name': 'tdnnf5.affine', 'node_type': 'component-node', 'id': 35, 'time_offsets': array([0]), 'params': (1024, 128), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 7.895306, 'bias-l2-norm': 1.8074945}
2020-02-11 14:44:31,388 __main__ INFO {'input': ['tdnnf5.affine'], 'component': 'tdnnf5.relu', 'name': 'tdnnf5.relu', 'node_type': 'component-node', 'id': 36, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 35328.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 6.929157, 'deriv_avg-l2-norm': 13.615622, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,388 __main__ INFO {'input': ['tdnnf5.relu'], 'component': 'tdnnf5.batchnorm', 'name': 'tdnnf5.batchnorm', 'node_type': 'component-node', 'id': 37, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 58624.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 6.9155283, 'stats_var-l2-norm': 4.8078284}
2020-02-11 14:44:31,389 __main__ INFO {'input': ['tdnnf5.batchnorm'], 'component': 'tdnnf5.dropout', 'name': 'tdnnf5.dropout', 'node_type': 'component-node', 'id': 38, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,389 __main__ INFO {'id': 39, 'type': 'Scale', 'name': 'tdnnf4.noop.Scale.0.66', 'input': ['tdnnf4.noop'], 'scale': 0.66}
2020-02-11 14:44:31,389 __main__ INFO {'id': 40, 'type': 'Sum', 'name': 'tdnnf4.noop.Scale.0.66.Sum.tdnnf5.dropout', 'input': ['tdnnf4.noop.Scale.0.66', 'tdnnf5.dropout']}
2020-02-11 14:44:31,389 __main__ INFO {'input': ['tdnnf4.noop.Scale.0.66.Sum.tdnnf5.dropout'], 'component': 'tdnnf5.noop', 'name': 'tdnnf5.noop', 'node_type': 'component-node', 'id': 41, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,390 __main__ INFO {'input': ['tdnnf5.noop'], 'component': 'tdnnf6.linear', 'name': 'tdnnf6.linear', 'node_type': 'component-node', 'id': 42, 'time_offsets': array([-3, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 11.609792}
2020-02-11 14:44:31,390 __main__ INFO {'input': ['tdnnf6.linear'], 'component': 'tdnnf6.affine', 'name': 'tdnnf6.affine', 'node_type': 'component-node', 'id': 43, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.078222, 'bias-l2-norm': 1.6047498}
2020-02-11 14:44:31,391 __main__ INFO {'input': ['tdnnf6.affine'], 'component': 'tdnnf6.relu', 'name': 'tdnnf6.relu', 'node_type': 'component-node', 'id': 44, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 18624.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 11.288168, 'deriv_avg-l2-norm': 15.204925, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,391 __main__ INFO {'input': ['tdnnf6.relu'], 'component': 'tdnnf6.batchnorm', 'name': 'tdnnf6.batchnorm', 'node_type': 'component-node', 'id': 45, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 56704.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 11.27561, 'stats_var-l2-norm': 10.5281}
2020-02-11 14:44:31,391 __main__ INFO {'input': ['tdnnf6.batchnorm'], 'component': 'tdnnf6.dropout', 'name': 'tdnnf6.dropout', 'node_type': 'component-node', 'id': 46, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,392 __main__ INFO {'id': 47, 'type': 'Scale', 'name': 'tdnnf5.noop.Scale.0.66', 'input': ['tdnnf5.noop'], 'scale': 0.66}
2020-02-11 14:44:31,392 __main__ INFO {'id': 48, 'type': 'Sum', 'name': 'tdnnf5.noop.Scale.0.66.Sum.tdnnf6.dropout', 'input': ['tdnnf5.noop.Scale.0.66', 'tdnnf6.dropout']}
2020-02-11 14:44:31,392 __main__ INFO {'input': ['tdnnf5.noop.Scale.0.66.Sum.tdnnf6.dropout'], 'component': 'tdnnf6.noop', 'name': 'tdnnf6.noop', 'node_type': 'component-node', 'id': 49, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,392 __main__ INFO {'input': ['tdnnf6.noop'], 'component': 'tdnnf7.linear', 'name': 'tdnnf7.linear', 'node_type': 'component-node', 'id': 50, 'time_offsets': array([-3, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 11.21937}
2020-02-11 14:44:31,393 __main__ INFO {'input': ['tdnnf7.linear'], 'component': 'tdnnf7.affine', 'name': 'tdnnf7.affine', 'node_type': 'component-node', 'id': 51, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 9.733917, 'bias-l2-norm': 1.7020972}
2020-02-11 14:44:31,394 __main__ INFO {'input': ['tdnnf7.affine'], 'component': 'tdnnf7.relu', 'name': 'tdnnf7.relu', 'node_type': 'component-node', 'id': 52, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 25920.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 9.964567, 'deriv_avg-l2-norm': 14.876908, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,394 __main__ INFO {'input': ['tdnnf7.relu'], 'component': 'tdnnf7.batchnorm', 'name': 'tdnnf7.batchnorm', 'node_type': 'component-node', 'id': 53, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 54784.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 9.983768, 'stats_var-l2-norm': 8.700165}
2020-02-11 14:44:31,394 __main__ INFO {'input': ['tdnnf7.batchnorm'], 'component': 'tdnnf7.dropout', 'name': 'tdnnf7.dropout', 'node_type': 'component-node', 'id': 54, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,394 __main__ INFO {'id': 55, 'type': 'Scale', 'name': 'tdnnf6.noop.Scale.0.66', 'input': ['tdnnf6.noop'], 'scale': 0.66}
2020-02-11 14:44:31,395 __main__ INFO {'id': 56, 'type': 'Sum', 'name': 'tdnnf6.noop.Scale.0.66.Sum.tdnnf7.dropout', 'input': ['tdnnf6.noop.Scale.0.66', 'tdnnf7.dropout']}
2020-02-11 14:44:31,395 __main__ INFO {'input': ['tdnnf6.noop.Scale.0.66.Sum.tdnnf7.dropout'], 'component': 'tdnnf7.noop', 'name': 'tdnnf7.noop', 'node_type': 'component-node', 'id': 57, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,395 __main__ INFO {'input': ['tdnnf7.noop'], 'component': 'tdnnf8.linear', 'name': 'tdnnf8.linear', 'node_type': 'component-node', 'id': 58, 'time_offsets': array([-3, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.982727}
2020-02-11 14:44:31,396 __main__ INFO {'input': ['tdnnf8.linear'], 'component': 'tdnnf8.affine', 'name': 'tdnnf8.affine', 'node_type': 'component-node', 'id': 59, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 9.5415945, 'bias-l2-norm': 1.716922}
2020-02-11 14:44:31,397 __main__ INFO {'input': ['tdnnf8.affine'], 'component': 'tdnnf8.relu', 'name': 'tdnnf8.relu', 'node_type': 'component-node', 'id': 60, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 22208.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 9.486838, 'deriv_avg-l2-norm': 14.476424, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,397 __main__ INFO {'input': ['tdnnf8.relu'], 'component': 'tdnnf8.batchnorm', 'name': 'tdnnf8.batchnorm', 'node_type': 'component-node', 'id': 61, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 52864.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 9.471879, 'stats_var-l2-norm': 8.468856}
2020-02-11 14:44:31,397 __main__ INFO {'input': ['tdnnf8.batchnorm'], 'component': 'tdnnf8.dropout', 'name': 'tdnnf8.dropout', 'node_type': 'component-node', 'id': 62, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,397 __main__ INFO {'id': 63, 'type': 'Scale', 'name': 'tdnnf7.noop.Scale.0.66', 'input': ['tdnnf7.noop'], 'scale': 0.66}
2020-02-11 14:44:31,398 __main__ INFO {'id': 64, 'type': 'Sum', 'name': 'tdnnf7.noop.Scale.0.66.Sum.tdnnf8.dropout', 'input': ['tdnnf7.noop.Scale.0.66', 'tdnnf8.dropout']}
2020-02-11 14:44:31,398 __main__ INFO {'input': ['tdnnf7.noop.Scale.0.66.Sum.tdnnf8.dropout'], 'component': 'tdnnf8.noop', 'name': 'tdnnf8.noop', 'node_type': 'component-node', 'id': 65, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,398 __main__ INFO {'input': ['tdnnf8.noop'], 'component': 'tdnnf9.linear', 'name': 'tdnnf9.linear', 'node_type': 'component-node', 'id': 66, 'time_offsets': array([-3, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.8021345}
2020-02-11 14:44:31,399 __main__ INFO {'input': ['tdnnf9.linear'], 'component': 'tdnnf9.affine', 'name': 'tdnnf9.affine', 'node_type': 'component-node', 'id': 67, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 9.299649, 'bias-l2-norm': 1.5847737}
2020-02-11 14:44:31,399 __main__ INFO {'input': ['tdnnf9.affine'], 'component': 'tdnnf9.relu', 'name': 'tdnnf9.relu', 'node_type': 'component-node', 'id': 68, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 23296.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 9.041475, 'deriv_avg-l2-norm': 14.187531, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,400 __main__ INFO {'input': ['tdnnf9.relu'], 'component': 'tdnnf9.batchnorm', 'name': 'tdnnf9.batchnorm', 'node_type': 'component-node', 'id': 69, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 50944.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 9.050355, 'stats_var-l2-norm': 8.135726}
2020-02-11 14:44:31,400 __main__ INFO {'input': ['tdnnf9.batchnorm'], 'component': 'tdnnf9.dropout', 'name': 'tdnnf9.dropout', 'node_type': 'component-node', 'id': 70, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,400 __main__ INFO {'id': 71, 'type': 'Scale', 'name': 'tdnnf8.noop.Scale.0.66', 'input': ['tdnnf8.noop'], 'scale': 0.66}
2020-02-11 14:44:31,400 __main__ INFO {'id': 72, 'type': 'Sum', 'name': 'tdnnf8.noop.Scale.0.66.Sum.tdnnf9.dropout', 'input': ['tdnnf8.noop.Scale.0.66', 'tdnnf9.dropout']}
2020-02-11 14:44:31,401 __main__ INFO {'input': ['tdnnf8.noop.Scale.0.66.Sum.tdnnf9.dropout'], 'component': 'tdnnf9.noop', 'name': 'tdnnf9.noop', 'node_type': 'component-node', 'id': 73, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,401 __main__ INFO {'input': ['tdnnf9.noop'], 'component': 'tdnnf10.linear', 'name': 'tdnnf10.linear', 'node_type': 'component-node', 'id': 74, 'time_offsets': array([-3, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.755399}
2020-02-11 14:44:31,402 __main__ INFO {'input': ['tdnnf10.linear'], 'component': 'tdnnf10.affine', 'name': 'tdnnf10.affine', 'node_type': 'component-node', 'id': 75, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 9.086278, 'bias-l2-norm': 1.3546987}
2020-02-11 14:44:31,402 __main__ INFO {'input': ['tdnnf10.affine'], 'component': 'tdnnf10.relu', 'name': 'tdnnf10.relu', 'node_type': 'component-node', 'id': 76, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 23232.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 9.134641, 'deriv_avg-l2-norm': 13.997164, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,402 __main__ INFO {'input': ['tdnnf10.relu'], 'component': 'tdnnf10.batchnorm', 'name': 'tdnnf10.batchnorm', 'node_type': 'component-node', 'id': 77, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 49024.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 9.114295, 'stats_var-l2-norm': 8.443671}
2020-02-11 14:44:31,403 __main__ INFO {'input': ['tdnnf10.batchnorm'], 'component': 'tdnnf10.dropout', 'name': 'tdnnf10.dropout', 'node_type': 'component-node', 'id': 78, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,403 __main__ INFO {'id': 79, 'type': 'Scale', 'name': 'tdnnf9.noop.Scale.0.66', 'input': ['tdnnf9.noop'], 'scale': 0.66}
2020-02-11 14:44:31,403 __main__ INFO {'id': 80, 'type': 'Sum', 'name': 'tdnnf9.noop.Scale.0.66.Sum.tdnnf10.dropout', 'input': ['tdnnf9.noop.Scale.0.66', 'tdnnf10.dropout']}
2020-02-11 14:44:31,403 __main__ INFO {'input': ['tdnnf9.noop.Scale.0.66.Sum.tdnnf10.dropout'], 'component': 'tdnnf10.noop', 'name': 'tdnnf10.noop', 'node_type': 'component-node', 'id': 81, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,404 __main__ INFO {'input': ['tdnnf10.noop'], 'component': 'tdnnf11.linear', 'name': 'tdnnf11.linear', 'node_type': 'component-node', 'id': 82, 'time_offsets': array([-3, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.697865}
2020-02-11 14:44:31,404 __main__ INFO {'input': ['tdnnf11.linear'], 'component': 'tdnnf11.affine', 'name': 'tdnnf11.affine', 'node_type': 'component-node', 'id': 83, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 8.883663, 'bias-l2-norm': 1.2464023}
2020-02-11 14:44:31,405 __main__ INFO {'input': ['tdnnf11.affine'], 'component': 'tdnnf11.relu', 'name': 'tdnnf11.relu', 'node_type': 'component-node', 'id': 84, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 31680.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 8.557134, 'deriv_avg-l2-norm': 13.096737, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,405 __main__ INFO {'input': ['tdnnf11.relu'], 'component': 'tdnnf11.batchnorm', 'name': 'tdnnf11.batchnorm', 'node_type': 'component-node', 'id': 85, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 47104.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 8.5475025, 'stats_var-l2-norm': 8.175571}
2020-02-11 14:44:31,405 __main__ INFO {'input': ['tdnnf11.batchnorm'], 'component': 'tdnnf11.dropout', 'name': 'tdnnf11.dropout', 'node_type': 'component-node', 'id': 86, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,406 __main__ INFO {'id': 87, 'type': 'Scale', 'name': 'tdnnf10.noop.Scale.0.66', 'input': ['tdnnf10.noop'], 'scale': 0.66}
2020-02-11 14:44:31,406 __main__ INFO {'id': 88, 'type': 'Sum', 'name': 'tdnnf10.noop.Scale.0.66.Sum.tdnnf11.dropout', 'input': ['tdnnf10.noop.Scale.0.66', 'tdnnf11.dropout']}
2020-02-11 14:44:31,406 __main__ INFO {'input': ['tdnnf10.noop.Scale.0.66.Sum.tdnnf11.dropout'], 'component': 'tdnnf11.noop', 'name': 'tdnnf11.noop', 'node_type': 'component-node', 'id': 89, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,406 __main__ INFO {'input': ['tdnnf11.noop'], 'component': 'tdnnf12.linear', 'name': 'tdnnf12.linear', 'node_type': 'component-node', 'id': 90, 'time_offsets': array([-3, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.470395}
2020-02-11 14:44:31,407 __main__ INFO {'input': ['tdnnf12.linear'], 'component': 'tdnnf12.affine', 'name': 'tdnnf12.affine', 'node_type': 'component-node', 'id': 91, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 8.823896, 'bias-l2-norm': 1.1193717}
2020-02-11 14:44:31,408 __main__ INFO {'input': ['tdnnf12.affine'], 'component': 'tdnnf12.relu', 'name': 'tdnnf12.relu', 'node_type': 'component-node', 'id': 92, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 16640.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 7.735138, 'deriv_avg-l2-norm': 12.613586, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,408 __main__ INFO {'input': ['tdnnf12.relu'], 'component': 'tdnnf12.batchnorm', 'name': 'tdnnf12.batchnorm', 'node_type': 'component-node', 'id': 93, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 45184.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 7.646413, 'stats_var-l2-norm': 7.3074822}
2020-02-11 14:44:31,408 __main__ INFO {'input': ['tdnnf12.batchnorm'], 'component': 'tdnnf12.dropout', 'name': 'tdnnf12.dropout', 'node_type': 'component-node', 'id': 94, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,408 __main__ INFO {'id': 95, 'type': 'Scale', 'name': 'tdnnf11.noop.Scale.0.66', 'input': ['tdnnf11.noop'], 'scale': 0.66}
2020-02-11 14:44:31,409 __main__ INFO {'id': 96, 'type': 'Sum', 'name': 'tdnnf11.noop.Scale.0.66.Sum.tdnnf12.dropout', 'input': ['tdnnf11.noop.Scale.0.66', 'tdnnf12.dropout']}
2020-02-11 14:44:31,409 __main__ INFO {'input': ['tdnnf11.noop.Scale.0.66.Sum.tdnnf12.dropout'], 'component': 'tdnnf12.noop', 'name': 'tdnnf12.noop', 'node_type': 'component-node', 'id': 97, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,409 __main__ INFO {'input': ['tdnnf12.noop'], 'component': 'tdnnf13.linear', 'name': 'tdnnf13.linear', 'node_type': 'component-node', 'id': 98, 'time_offsets': array([-3, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.259589}
2020-02-11 14:44:31,410 __main__ INFO {'input': ['tdnnf13.linear'], 'component': 'tdnnf13.affine', 'name': 'tdnnf13.affine', 'node_type': 'component-node', 'id': 99, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 8.941648, 'bias-l2-norm': 0.99510527}
2020-02-11 14:44:31,410 __main__ INFO {'input': ['tdnnf13.affine'], 'component': 'tdnnf13.relu', 'name': 'tdnnf13.relu', 'node_type': 'component-node', 'id': 100, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 32512.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 5.9976406, 'deriv_avg-l2-norm': 11.490221, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,411 __main__ INFO {'input': ['tdnnf13.relu'], 'component': 'tdnnf13.batchnorm', 'name': 'tdnnf13.batchnorm', 'node_type': 'component-node', 'id': 101, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 43264.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 5.9866805, 'stats_var-l2-norm': 5.459227}
2020-02-11 14:44:31,411 __main__ INFO {'input': ['tdnnf13.batchnorm'], 'component': 'tdnnf13.dropout', 'name': 'tdnnf13.dropout', 'node_type': 'component-node', 'id': 102, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,411 __main__ INFO {'id': 103, 'type': 'Scale', 'name': 'tdnnf12.noop.Scale.0.66', 'input': ['tdnnf12.noop'], 'scale': 0.66}
2020-02-11 14:44:31,411 __main__ INFO {'id': 104, 'type': 'Sum', 'name': 'tdnnf12.noop.Scale.0.66.Sum.tdnnf13.dropout', 'input': ['tdnnf12.noop.Scale.0.66', 'tdnnf13.dropout']}
2020-02-11 14:44:31,411 __main__ INFO {'input': ['tdnnf12.noop.Scale.0.66.Sum.tdnnf13.dropout'], 'component': 'tdnnf13.noop', 'name': 'tdnnf13.noop', 'node_type': 'component-node', 'id': 105, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,412 __main__ INFO {'input': ['tdnnf13.noop'], 'component': 'prefinal-l', 'name': 'prefinal-l', 'node_type': 'component-node', 'id': 106, 'params': (256, 1024), 'rank_inout': 20, 'alpha': 4.0, 'num_samples_history': 2000.0, 'type': 'Linear', 'raw-type': 'Linear', 'params-l2-norm': 14.927766}
2020-02-11 14:44:31,412 __main__ INFO {'input': ['prefinal-l'], 'component': 'prefinal-chain.affine', 'name': 'prefinal-chain.affine', 'node_type': 'component-node', 'id': 107, 'max_change': 0.75, 'params': (1024, 256), 'bias': (1024,), 'rank_in': 20, 'rank_out': 80, 'num_samples_history': 2000.0, 'alpha': 4.0, 'type': 'Gemm', 'raw-type': 'NaturalGradientAffine', 'params-l2-norm': 9.80533, 'bias-l2-norm': 1.433072}
2020-02-11 14:44:31,413 __main__ INFO {'input': ['prefinal-chain.affine'], 'component': 'prefinal-chain.relu', 'name': 'prefinal-chain.relu', 'node_type': 'component-node', 'id': 108, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 19712.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 6.8817225, 'deriv_avg-l2-norm': 12.461684, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,413 __main__ INFO {'input': ['prefinal-chain.relu'], 'component': 'prefinal-chain.batchnorm1', 'name': 'prefinal-chain.batchnorm1', 'node_type': 'component-node', 'id': 109, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 43264.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 6.939498, 'stats_var-l2-norm': 6.1382785}
2020-02-11 14:44:31,413 __main__ INFO {'input': ['prefinal-chain.batchnorm1'], 'component': 'prefinal-chain.linear', 'name': 'prefinal-chain.linear', 'node_type': 'component-node', 'id': 110, 'params': (256, 1024), 'rank_inout': 20, 'alpha': 4.0, 'num_samples_history': 2000.0, 'type': 'Linear', 'raw-type': 'Linear', 'params-l2-norm': 14.669719}
2020-02-11 14:44:31,414 __main__ INFO {'input': ['prefinal-chain.linear'], 'component': 'prefinal-chain.batchnorm2', 'name': 'prefinal-chain.batchnorm2', 'node_type': 'component-node', 'id': 111, 'dim': 256, 'block_dim': 256, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 43264.0, 'stats_mean': (256,), 'stats_var': (256,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 3.6812432e-07, 'stats_var-l2-norm': 21.627495}
2020-02-11 14:44:31,415 __main__ INFO {'input': ['prefinal-chain.batchnorm2'], 'component': 'output.affine', 'name': 'output.affine', 'node_type': 'component-node', 'id': 112, 'max_change': 1.5, 'params': (3448, 256), 'bias': (3448,), 'rank_in': 20, 'rank_out': 80, 'num_samples_history': 2000.0, 'alpha': 4.0, 'type': 'Gemm', 'raw-type': 'NaturalGradientAffine', 'params-l2-norm': 33.547924, 'bias-l2-norm': 6.109993}
2020-02-11 14:44:31,415 __main__ INFO {'objective': 'linear', 'input': ['output.affine'], 'name': 'output', 'node_type': 'output-node', 'type': 'Output', 'id': 113}
2020-02-11 14:44:31,415 __main__ INFO {'input': ['prefinal-l'], 'component': 'prefinal-xent.affine', 'name': 'prefinal-xent.affine', 'node_type': 'component-node', 'id': 114, 'max_change': 0.75, 'params': (1024, 256), 'bias': (1024,), 'rank_in': 20, 'rank_out': 80, 'num_samples_history': 2000.0, 'alpha': 4.0, 'type': 'Gemm', 'raw-type': 'NaturalGradientAffine', 'params-l2-norm': 8.215993, 'bias-l2-norm': 2.358821}
2020-02-11 14:44:31,416 __main__ INFO {'input': ['prefinal-xent.affine'], 'component': 'prefinal-xent.relu', 'name': 'prefinal-xent.relu', 'node_type': 'component-node', 'id': 115, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 23936.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 6.315324, 'deriv_avg-l2-norm': 12.559063, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,416 __main__ INFO {'input': ['prefinal-xent.relu'], 'component': 'prefinal-xent.batchnorm1', 'name': 'prefinal-xent.batchnorm1', 'node_type': 'component-node', 'id': 116, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 43264.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 6.244672, 'stats_var-l2-norm': 3.9114242}
2020-02-11 14:44:31,416 __main__ INFO {'input': ['prefinal-xent.batchnorm1'], 'component': 'prefinal-xent.linear', 'name': 'prefinal-xent.linear', 'node_type': 'component-node', 'id': 117, 'params': (256, 1024), 'rank_inout': 20, 'alpha': 4.0, 'num_samples_history': 2000.0, 'type': 'Linear', 'raw-type': 'Linear', 'params-l2-norm': 10.986344}
2020-02-11 14:44:31,417 __main__ INFO {'input': ['prefinal-xent.linear'], 'component': 'prefinal-xent.batchnorm2', 'name': 'prefinal-xent.batchnorm2', 'node_type': 'component-node', 'id': 118, 'dim': 256, 'block_dim': 256, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 43264.0, 'stats_mean': (256,), 'stats_var': (256,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 2.989858e-07, 'stats_var-l2-norm': 6.3436804}
2020-02-11 14:44:31,418 __main__ INFO {'input': ['prefinal-xent.batchnorm2'], 'component': 'output-xent.affine', 'name': 'output-xent.affine', 'node_type': 'component-node', 'id': 119, 'max_change': 1.5, 'params': (3448, 256), 'bias': (3448,), 'rank_in': 20, 'rank_out': 80, 'num_samples_history': 2000.0, 'alpha': 4.0, 'type': 'Gemm', 'raw-type': 'NaturalGradientAffine', 'params-l2-norm': 54.274277, 'bias-l2-norm': 2.9230652}
2020-02-11 14:44:31,418 __main__ INFO {'input': ['output-xent.affine'], 'component': 'output-xent.log-softmax', 'name': 'output-xent.log-softmax', 'node_type': 'component-node', 'id': 120, 'dim': 3448, 'value_avg': array([], dtype=float32), 'deriv_avg': array([], dtype=float32), 'count': 0.0, 'oderiv_rms': (3448,), 'oderiv_count': 0.0, 'type': 'LogSoftmax', 'raw-type': 'LogSoftmax', 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,418 __main__ INFO {'objective': 'linear', 'input': ['output-xent.log-softmax'], 'name': 'output-xent', 'node_type': 'output-node', 'type': 'Output', 'id': 121}
and this is the l2-norm of pytorch's model parameter
2020-02-11 14:59:18,264 (common:38) INFO: load checkpoint from /mnt/cfs1_alias1/asr/users/fanlu/task/kaldi_recipe/pybind/s10/exp/chain/train_q2_orthogonal_modelmodel_tdnnf3_def_init_opadam_bs128_ep6_lr1e-3_fpe150_110_90_hn1024_fpr1500000_ms1_2_3_4_5_kernel1_1_1_0_3_3_3_3_3_3_3_3_stride1_1_1_3_1_1_1_1_1_1_1_1_l2r5e-4/best_model.pt
2020-02-11 14:59:19,863 (model_tdnnf3:224) INFO: name: tdnn1_affine.weight, shape: torch.Size([1024, 129]), l2-norm: 13.578791618347168, np-l2-norm: 13.578801155090332
2020-02-11 14:59:28,992 (model_tdnnf3:224) INFO: name: tdnn1_affine.bias, shape: torch.Size([1024]), l2-norm: 2.2701199054718018, np-l2-norm: 2.270120859146118
2020-02-11 14:59:28,992 (model_tdnnf3:224) INFO: name: tdnn1_batchnorm.weight, shape: torch.Size([1024]), l2-norm: 10.799092292785645, np-l2-norm: 10.799091339111328
2020-02-11 14:59:28,993 (model_tdnnf3:224) INFO: name: tdnn1_batchnorm.bias, shape: torch.Size([1024]), l2-norm: 2.189619302749634, np-l2-norm: 2.1896190643310547
2020-02-11 14:59:28,994 (model_tdnnf3:224) INFO: name: tdnnfs.0.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 11.698036193847656, np-l2-norm: 11.698058128356934
2020-02-11 14:59:28,994 (model_tdnnf3:224) INFO: name: tdnnfs.0.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 12.456302642822266, np-l2-norm: 12.45630931854248
2020-02-11 14:59:28,995 (model_tdnnf3:224) INFO: name: tdnnfs.0.affine.bias, shape: torch.Size([1024]), l2-norm: 0.8999515175819397, np-l2-norm: 0.8999518156051636
2020-02-11 14:59:28,995 (model_tdnnf3:224) INFO: name: tdnnfs.0.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 8.536678314208984, np-l2-norm: 8.536681175231934
2020-02-11 14:59:28,995 (model_tdnnf3:224) INFO: name: tdnnfs.0.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.9667662382125854, np-l2-norm: 0.9667660593986511
2020-02-11 14:59:28,996 (model_tdnnf3:224) INFO: name: tdnnfs.1.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 11.157291412353516, np-l2-norm: 11.15730094909668
2020-02-11 14:59:28,997 (model_tdnnf3:224) INFO: name: tdnnfs.1.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.972347259521484, np-l2-norm: 10.972354888916016
2020-02-11 14:59:28,997 (model_tdnnf3:224) INFO: name: tdnnfs.1.affine.bias, shape: torch.Size([1024]), l2-norm: 0.6986019611358643, np-l2-norm: 0.6986021399497986
2020-02-11 14:59:28,997 (model_tdnnf3:224) INFO: name: tdnnfs.1.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 7.437180995941162, np-l2-norm: 7.437183856964111
2020-02-11 14:59:28,998 (model_tdnnf3:224) INFO: name: tdnnfs.1.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.6932032704353333, np-l2-norm: 0.6932030916213989
2020-02-11 14:59:28,998 (model_tdnnf3:224) INFO: name: tdnnfs.2.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.548410415649414, np-l2-norm: 10.548415184020996
2020-02-11 14:59:28,999 (model_tdnnf3:224) INFO: name: tdnnfs.2.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.08105754852295, np-l2-norm: 10.081061363220215
2020-02-11 14:59:28,999 (model_tdnnf3:224) INFO: name: tdnnfs.2.affine.bias, shape: torch.Size([1024]), l2-norm: 0.5530431866645813, np-l2-norm: 0.5530433654785156
2020-02-11 14:59:29,000 (model_tdnnf3:224) INFO: name: tdnnfs.2.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 7.182321548461914, np-l2-norm: 7.182323932647705
2020-02-11 14:59:29,000 (model_tdnnf3:224) INFO: name: tdnnfs.2.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.7540080547332764, np-l2-norm: 0.7540078163146973
2020-02-11 14:59:29,001 (model_tdnnf3:224) INFO: name: tdnnfs.3.linear.conv.weight, shape: torch.Size([128, 1024, 1]), l2-norm: 7.231468677520752, np-l2-norm: 7.23146915435791
2020-02-11 14:59:29,001 (model_tdnnf3:224) INFO: name: tdnnfs.3.affine.weight, shape: torch.Size([1024, 128, 1]), l2-norm: 7.3756842613220215, np-l2-norm: 7.37568473815918
2020-02-11 14:59:29,001 (model_tdnnf3:224) INFO: name: tdnnfs.3.affine.bias, shape: torch.Size([1024]), l2-norm: 0.8493649363517761, np-l2-norm: 0.8493649959564209
2020-02-11 14:59:29,002 (model_tdnnf3:224) INFO: name: tdnnfs.3.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 4.721061706542969, np-l2-norm: 4.721061706542969
2020-02-11 14:59:29,002 (model_tdnnf3:224) INFO: name: tdnnfs.3.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.8471177816390991, np-l2-norm: 0.8471177220344543
2020-02-11 14:59:29,003 (model_tdnnf3:224) INFO: name: tdnnfs.4.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.269883155822754, np-l2-norm: 10.26988697052002
2020-02-11 14:59:29,003 (model_tdnnf3:224) INFO: name: tdnnfs.4.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 11.01193904876709, np-l2-norm: 11.011943817138672
2020-02-11 14:59:29,004 (model_tdnnf3:224) INFO: name: tdnnfs.4.affine.bias, shape: torch.Size([1024]), l2-norm: 0.6541978120803833, np-l2-norm: 0.6541979312896729
2020-02-11 14:59:29,004 (model_tdnnf3:224) INFO: name: tdnnfs.4.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.021320343017578, np-l2-norm: 6.021320343017578
2020-02-11 14:59:29,004 (model_tdnnf3:224) INFO: name: tdnnfs.4.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.7195524573326111, np-l2-norm: 0.7195526957511902
2020-02-11 14:59:29,005 (model_tdnnf3:224) INFO: name: tdnnfs.5.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.503177642822266, np-l2-norm: 10.503182411193848
2020-02-11 14:59:29,006 (model_tdnnf3:224) INFO: name: tdnnfs.5.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 11.230865478515625, np-l2-norm: 11.23087215423584
2020-02-11 14:59:29,006 (model_tdnnf3:224) INFO: name: tdnnfs.5.affine.bias, shape: torch.Size([1024]), l2-norm: 0.6815171837806702, np-l2-norm: 0.6815172433853149
2020-02-11 14:59:29,006 (model_tdnnf3:224) INFO: name: tdnnfs.5.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.027979373931885, np-l2-norm: 6.027980327606201
2020-02-11 14:59:29,007 (model_tdnnf3:224) INFO: name: tdnnfs.5.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.66847163438797, np-l2-norm: 0.6684714555740356
2020-02-11 14:59:29,007 (model_tdnnf3:224) INFO: name: tdnnfs.6.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.178318977355957, np-l2-norm: 10.178319931030273
2020-02-11 14:59:29,008 (model_tdnnf3:224) INFO: name: tdnnfs.6.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.798919677734375, np-l2-norm: 10.798924446105957
2020-02-11 14:59:29,008 (model_tdnnf3:224) INFO: name: tdnnfs.6.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7030293345451355, np-l2-norm: 0.703029453754425
2020-02-11 14:59:29,009 (model_tdnnf3:224) INFO: name: tdnnfs.6.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 5.839771747589111, np-l2-norm: 5.839775085449219
2020-02-11 14:59:29,009 (model_tdnnf3:224) INFO: name: tdnnfs.6.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.6625904440879822, np-l2-norm: 0.6625903248786926
2020-02-11 14:59:29,010 (model_tdnnf3:224) INFO: name: tdnnfs.7.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.609781265258789, np-l2-norm: 10.609784126281738
2020-02-11 14:59:29,010 (model_tdnnf3:224) INFO: name: tdnnfs.7.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.92614459991455, np-l2-norm: 10.926151275634766
2020-02-11 14:59:29,011 (model_tdnnf3:224) INFO: name: tdnnfs.7.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7368164658546448, np-l2-norm: 0.7368165850639343
2020-02-11 14:59:29,011 (model_tdnnf3:224) INFO: name: tdnnfs.7.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.065881252288818, np-l2-norm: 6.065880298614502
2020-02-11 14:59:29,011 (model_tdnnf3:224) INFO: name: tdnnfs.7.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.6848169565200806, np-l2-norm: 0.6848171353340149
2020-02-11 14:59:29,012 (model_tdnnf3:224) INFO: name: tdnnfs.8.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.567127227783203, np-l2-norm: 10.567130088806152
2020-02-11 14:59:29,013 (model_tdnnf3:224) INFO: name: tdnnfs.8.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.671625137329102, np-l2-norm: 10.671629905700684
2020-02-11 14:59:29,013 (model_tdnnf3:224) INFO: name: tdnnfs.8.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7605870962142944, np-l2-norm: 0.7605868577957153
2020-02-11 14:59:29,013 (model_tdnnf3:224) INFO: name: tdnnfs.8.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.370297908782959, np-l2-norm: 6.37030029296875
2020-02-11 14:59:29,014 (model_tdnnf3:224) INFO: name: tdnnfs.8.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.673714280128479, np-l2-norm: 0.6737140417098999
2020-02-11 14:59:29,014 (model_tdnnf3:224) INFO: name: tdnnfs.9.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.171960830688477, np-l2-norm: 10.17195987701416
2020-02-11 14:59:29,015 (model_tdnnf3:224) INFO: name: tdnnfs.9.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.168111801147461, np-l2-norm: 10.168119430541992
2020-02-11 14:59:29,015 (model_tdnnf3:224) INFO: name: tdnnfs.9.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7516912817955017, np-l2-norm: 0.7516909241676331
2020-02-11 14:59:29,016 (model_tdnnf3:224) INFO: name: tdnnfs.9.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.299006938934326, np-l2-norm: 6.299007892608643
2020-02-11 14:59:29,016 (model_tdnnf3:224) INFO: name: tdnnfs.9.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.6358782649040222, np-l2-norm: 0.6358781456947327
2020-02-11 14:59:29,017 (model_tdnnf3:224) INFO: name: tdnnfs.10.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.095071792602539, np-l2-norm: 10.095071792602539
2020-02-11 14:59:29,017 (model_tdnnf3:224) INFO: name: tdnnfs.10.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.038125038146973, np-l2-norm: 10.038130760192871
2020-02-11 14:59:29,017 (model_tdnnf3:224) INFO: name: tdnnfs.10.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7603970766067505, np-l2-norm: 0.7603970170021057
2020-02-11 14:59:29,018 (model_tdnnf3:224) INFO: name: tdnnfs.10.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.457774639129639, np-l2-norm: 6.457772731781006
2020-02-11 14:59:29,018 (model_tdnnf3:224) INFO: name: tdnnfs.10.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.7173940539360046, np-l2-norm: 0.7173939347267151
2020-02-11 14:59:29,019 (model_tdnnf3:224) INFO: name: tdnnfs.11.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 9.877976417541504, np-l2-norm: 9.877982139587402
2020-02-11 14:59:29,019 (model_tdnnf3:224) INFO: name: tdnnfs.11.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 9.827290534973145, np-l2-norm: 9.827296257019043
2020-02-11 14:59:29,020 (model_tdnnf3:224) INFO: name: tdnnfs.11.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7950196266174316, np-l2-norm: 0.7950197458267212
2020-02-11 14:59:29,020 (model_tdnnf3:224) INFO: name: tdnnfs.11.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.03067684173584, np-l2-norm: 6.03067684173584
2020-02-11 14:59:29,020 (model_tdnnf3:224) INFO: name: tdnnfs.11.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 1.1897754669189453, np-l2-norm: 1.1897754669189453
2020-02-11 14:59:29,021 (model_tdnnf3:224) INFO: name: prefinal_l.conv.weight, shape: torch.Size([256, 1024, 1]), l2-norm: 12.372564315795898, np-l2-norm: 12.372568130493164
2020-02-11 14:59:29,022 (model_tdnnf3:224) INFO: name: prefinal_chain.affine.weight, shape: torch.Size([1024, 256]), l2-norm: 10.835281372070312, np-l2-norm: 10.835285186767578
2020-02-11 14:59:29,022 (model_tdnnf3:224) INFO: name: prefinal_chain.affine.bias, shape: torch.Size([1024]), l2-norm: 1.2527509927749634, np-l2-norm: 1.2527514696121216
2020-02-11 14:59:29,023 (model_tdnnf3:224) INFO: name: prefinal_chain.batchnorm1.weight, shape: torch.Size([1024]), l2-norm: 10.376221656799316, np-l2-norm: 10.37622356414795
2020-02-11 14:59:29,024 (model_tdnnf3:224) INFO: name: prefinal_chain.batchnorm1.bias, shape: torch.Size([1024]), l2-norm: 1.386192707286682e-05, np-l2-norm: 1.3861925253877416e-05
2020-02-11 14:59:29,025 (model_tdnnf3:224) INFO: name: prefinal_chain.linear.conv.weight, shape: torch.Size([256, 1024, 1]), l2-norm: 10.254746437072754, np-l2-norm: 10.254751205444336
2020-02-11 14:59:29,025 (model_tdnnf3:224) INFO: name: prefinal_chain.batchnorm2.weight, shape: torch.Size([256]), l2-norm: 7.542169570922852, np-l2-norm: 7.542168140411377
2020-02-11 14:59:29,025 (model_tdnnf3:224) INFO: name: prefinal_chain.batchnorm2.bias, shape: torch.Size([256]), l2-norm: 0.5150323510169983, np-l2-norm: 0.5150324702262878
2020-02-11 14:59:29,027 (model_tdnnf3:224) INFO: name: output_affine.weight, shape: torch.Size([4336, 256]), l2-norm: 16.972375869750977, np-l2-norm: 16.97245216369629
2020-02-11 14:59:29,027 (model_tdnnf3:224) INFO: name: output_affine.bias, shape: torch.Size([4336]), l2-norm: 0.8673517107963562, np-l2-norm: 0.8673520088195801
2020-02-11 14:59:29,028 (model_tdnnf3:224) INFO: name: prefinal_xent.affine.weight, shape: torch.Size([1024, 256]), l2-norm: 8.239212989807129, np-l2-norm: 8.239215850830078
2020-02-11 14:59:29,028 (model_tdnnf3:224) INFO: name: prefinal_xent.affine.bias, shape: torch.Size([1024]), l2-norm: 0.741665780544281, np-l2-norm: 0.7416657209396362
2020-02-11 14:59:29,029 (model_tdnnf3:224) INFO: name: prefinal_xent.batchnorm1.weight, shape: torch.Size([1024]), l2-norm: 6.000321865081787, np-l2-norm: 6.00032377243042
2020-02-11 14:59:29,029 (model_tdnnf3:224) INFO: name: prefinal_xent.batchnorm1.bias, shape: torch.Size([1024]), l2-norm: 2.1985473722452298e-05, np-l2-norm: 2.198547008447349e-05
2020-02-11 14:59:29,030 (model_tdnnf3:224) INFO: name: prefinal_xent.linear.conv.weight, shape: torch.Size([256, 1024, 1]), l2-norm: 9.000222206115723, np-l2-norm: 9.000225067138672
2020-02-11 14:59:29,030 (model_tdnnf3:224) INFO: name: prefinal_xent.batchnorm2.weight, shape: torch.Size([256]), l2-norm: 29.329822540283203, np-l2-norm: 29.329822540283203
2020-02-11 14:59:29,031 (model_tdnnf3:224) INFO: name: prefinal_xent.batchnorm2.bias, shape: torch.Size([256]), l2-norm: 2.5147366523742676, np-l2-norm: 2.5147361755371094
2020-02-11 14:59:29,032 (model_tdnnf3:224) INFO: name: output_xent_affine.weight, shape: torch.Size([4336, 256]), l2-norm: 30.42934799194336, np-l2-norm: 30.429412841796875
2020-02-11 14:59:29,033 (model_tdnnf3:224) INFO: name: output_xent_affine.bias, shape: torch.Size([4336]), l2-norm: 0.7537439465522766, np-l2-norm: 0.7537445425987244
2020-02-11 14:59:29,033 (model_tdnnf3:224) INFO: name: input_batch_norm.weight, shape: torch.Size([129]), l2-norm: 8.521322250366211, np-l2-norm: 8.521322250366211
2020-02-11 14:59:29,033 (model_tdnnf3:224) INFO: name: input_batch_norm.bias, shape: torch.Size([129]), l2-norm: 1.5148330926895142, np-l2-norm: 1.5148330926895142
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3892?email_source=notifications&email_token=AAZFLOYCJLQ2WP6YKALJGU3RCJFCBA5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELLMZCQ#issuecomment-584502410>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO5ZC4I3DCKYKN37YTLRCJFCBANCNFSM4KNO5YQA>
.
|
the log of last iter or all iters? this is the Norms log from kaldi's tdnn_1c.
|
Yes, cool. What are the corresponding norms from the PyTorch model? If you
have showed these already, I didn't notice them.
…On Tue, Feb 11, 2020 at 3:33 PM fanlu ***@***.***> wrote:
oh sorry for Kaldi's model.. it will be printed in the progress.N.log,
search for Norm
… <#m_-723985659311707384_>
On Tue, Feb 11, 2020 at 2:22 PM 付嘉懿 *@*.***> wrote: which layer should I
focus on ? And Is there a tool to get l2 norm of kaldi's parameter? Maybe
this is a simple way: use kaldi tool: "nnet-am-copy --binary=false
final.mdl" to convert the mdl file to the text mode and then write a script
a compute the 2-norm of weights. — You are receiving this because you were
mentioned. Reply to this email directly, view it on GitHub <#3892
<#3892>?email_source=notifications&email_token=AAZFLO5VNIK7WCBUT6DKGX3RCI72XA5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELLKSHA#issuecomment-584493340>,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAZFLO2X4YKGIPDIS7REQFDRCI72XANCNFSM4KNO5YQA
.
the log of last iter or all iters? this is the Norms log from kaldi's
tdnn_1c.
./log/progress.74.log:LOG (nnet3-show-progress[5.5.717~1-e05890d]:main():nnet3-show-progress.cc:153) Norms of parameter matrices from <new-nnet-in> are [ tdnn1.affine:18.4505 tdnnf2.linear:14.7518 tdnnf2.affine:13.3762 tdnnf3.linear:12.3981 tdnnf3.affine:11.0532 tdnnf4.linear:11.7087 tdnnf4.affine:10.0316 tdnnf5.linear:8.64949 tdnnf5.affine:8.20888 tdnnf6.linear:11.7272 tdnnf6.affine:10.3102 tdnnf7.linear:11.3739 tdnnf7.affine:10.0447 tdnnf8.linear:11.1174 tdnnf8.affine:9.76855 tdnnf9.linear:10.9489 tdnnf9.affine:9.57144 tdnnf10.linear:10.8642 tdnnf10.affine:9.30931 tdnnf11.linear:10.8142 tdnnf11.affine:9.08426 tdnnf12.linear:10.6601 tdnnf12.affine:9.06823 tdnnf13.linear:10.4231 tdnnf13.affine:9.131 prefinal-l:15.0584 prefinal-chain.affine:10.0323 prefinal-chain.linear:15.0632 output.affine:34.4237 prefinal-xent.affine:8.71721 prefinal-xent.linear:11.205 output-xent.affine:54.2259 ]
./log/progress.75.log:LOG (nnet3-show-progress[5.5.717~1-e05890d]:main():nnet3-show-progress.cc:153) Norms of parameter matrices from <new-nnet-in> are [ tdnn1.affine:18.3429 tdnnf2.linear:14.6932 tdnnf2.affine:13.3021 tdnnf3.linear:12.3406 tdnnf3.affine:11.0014 tdnnf4.linear:11.6603 tdnnf4.affine:9.9918 tdnnf5.linear:8.60848 tdnnf5.affine:8.17265 tdnnf6.linear:11.6833 tdnnf6.affine:10.2691 tdnnf7.linear:11.3308 tdnnf7.affine:10.006 tdnnf8.linear:11.0748 tdnnf8.affine:9.73264 tdnnf9.linear:10.9096 tdnnf9.affine:9.53797 tdnnf10.linear:10.8255 tdnnf10.affine:9.27463 tdnnf11.linear:10.7736 tdnnf11.affine:9.04792 tdnnf12.linear:10.6172 tdnnf12.affine:9.03221 tdnnf13.linear:10.3811 tdnnf13.affine:9.0969 prefinal-l:14.9907 prefinal-chain.affine:9.98558 prefinal-chain.linear:14.9604 output.affine:34.3603 prefinal-xent.affine:8.66909 prefinal-xent.linear:11.1402 output-xent.affine:54.1951 ]
./log/progress.76.log:LOG (nnet3-show-progress[5.5.717~1-e05890d]:main():nnet3-show-progress.cc:153) Norms of parameter matrices from <new-nnet-in> are [ tdnn1.affine:18.2866 tdnnf2.linear:14.6814 tdnnf2.affine:13.2642 tdnnf3.linear:12.3177 tdnnf3.affine:10.9801 tdnnf4.linear:11.6426 tdnnf4.affine:9.98039 tdnnf5.linear:8.59164 tdnnf5.affine:8.16007 tdnnf6.linear:11.6703 tdnnf6.affine:10.2559 tdnnf7.linear:11.3178 tdnnf7.affine:9.99525 tdnnf8.linear:11.0611 tdnnf8.affine:9.72302 tdnnf9.linear:10.899 tdnnf9.affine:9.53062 tdnnf10.linear:10.8144 tdnnf10.affine:9.26636 tdnnf11.linear:10.7627 tdnnf11.affine:9.03746 tdnnf12.linear:10.6021 tdnnf12.affine:9.02006 tdnnf13.linear:10.3647 tdnnf13.affine:9.08546 prefinal-l:14.9618 prefinal-chain.affine:9.96465 prefinal-chain.linear:14.8922 output.affine:34.3072 prefinal-xent.affine:8.64409 prefinal-xent.linear:11.1031 output-xent.affine:54.2753 ]
./log/progress.77.log:LOG (nnet3-show-progress[5.5.717~1-e05890d]:main():nnet3-show-progress.cc:153) Norms of parameter matrices from <new-nnet-in> are [ tdnn1.affine:18.2199 tdnnf2.linear:14.6626 tdnnf2.affine:13.2239 tdnnf3.linear:12.2911 tdnnf3.affine:10.9535 tdnnf4.linear:11.6206 tdnnf4.affine:9.96432 tdnnf5.linear:8.57113 tdnnf5.affine:8.14308 tdnnf6.linear:11.6539 tdnnf6.affine:10.2378 tdnnf7.linear:11.2999 tdnnf7.affine:9.97964 tdnnf8.linear:11.0445 tdnnf8.affine:9.70736 tdnnf9.linear:10.8822 tdnnf9.affine:9.51659 tdnnf10.linear:10.7979 tdnnf10.affine:9.25141 tdnnf11.linear:10.7436 tdnnf11.affine:9.02005 tdnnf12.linear:10.5817 tdnnf12.affine:9.00242 tdnnf13.linear:10.3444 tdnnf13.affine:9.07074 prefinal-l:14.9253 prefinal-chain.affine:9.93753 prefinal-chain.linear:14.8181 output.affine:34.2569 prefinal-xent.affine:8.61303 prefinal-xent.linear:11.0593 output-xent.affine:54.323 ]
./log/progress.78.log:LOG (nnet3-show-progress[5.5.717~1-e05890d]:main():nnet3-show-progress.cc:153) Norms of parameter matrices from <new-nnet-in> are [ tdnn1.affine:18.1474 tdnnf2.linear:14.6329 tdnnf2.affine:13.1756 tdnnf3.linear:12.2561 tdnnf3.affine:10.9222 tdnnf4.linear:11.5922 tdnnf4.affine:9.94353 tdnnf5.linear:8.5489 tdnnf5.affine:8.12412 tdnnf6.linear:11.6324 tdnnf6.affine:10.2153 tdnnf7.linear:11.2786 tdnnf7.affine:9.96117 tdnnf8.linear:11.0244 tdnnf8.affine:9.69122 tdnnf9.linear:10.8637 tdnnf9.affine:9.50082 tdnnf10.linear:10.7798 tdnnf10.affine:9.23452 tdnnf11.linear:10.7234 tdnnf11.affine:9.00197 tdnnf12.linear:10.5592 tdnnf12.affine:8.98375 tdnnf13.linear:10.3206 tdnnf13.affine:9.05327 prefinal-l:14.8827 prefinal-chain.affine:9.90754 prefinal-chain.linear:14.7399 output.affine:34.2076 prefinal-xent.affine:8.58043 prefinal-xent.linear:11.0129 output-xent.affine:54.3543 ]
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3892?email_source=notifications&email_token=AAZFLOYR4NHRTZIYTVOF27LRCJIFLA5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELLONIY#issuecomment-584509091>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO4JJWJB3D62PBJESKTRCJIFLANCNFSM4KNO5YQA>
.
|
I am drawing the corresponding norms of the Pytorch model, please wait a while. |
Thanks. Just the numbers would be fine-- no figure needed!
…On Tue, Feb 11, 2020 at 3:50 PM fanlu ***@***.***> wrote:
I am drawing the corresponding norms of the Pytorch model, please wait a
while.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3892?email_source=notifications&email_token=AAZFLO3GC34RFTW2C23PEPLRCJKE7A5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELLPOFA#issuecomment-584513300>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLOZPONYKISYB5ZTJQMLRCJKE7ANCNFSM4KNO5YQA>
.
|
I must run this exp to log norm of every iteration again, Since I have the last norm of Pytorch model only. |
Just the last one is fine.
…On Tue, Feb 11, 2020 at 4:06 PM fanlu ***@***.***> wrote:
I must run this exp to log norm of every iteration again, Since I have the
last norm of Pytorch model only.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3892?email_source=notifications&email_token=AAZFLO56B2YNN4XAVNFDXI3RCJMABA5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELLQONY#issuecomment-584517431>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLOZGGDSG3DCWIE4X673RCJMABANCNFSM4KNO5YQA>
.
|
Let me merge this now, so we don't get too far out of sync. |
Here is the differences.
|
OK, interesting. They are very close. What were the final learning rates in each case, and what was the minibatch size in PyTorch? |
# TODO(fangjun): implement GeneralDropoutComponent in PyTorch | ||
|
||
if self.linear.kernel_size == 3: | ||
x = self.bypass_scale * input_x[:, :, 1:-1:self.conv_stride] + x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be c:-c:c
rather than 1:-1:c
, where c
is self.conv_stride?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose the time_stride is 1 and the conv_stride is 1.
If the input time index is
0 1 2 3 4 5 6
After self.linear
, the time index will be
1 2 3 4 5
since the kernel shape is [-1, 0, 1]
(time_stride == 1)
After self.affine
, the time index is still
1 2 3 4 5
The index of input[1:-1:self.conv_stride]
is [1, 2, 3, 4, 5]
which matches
the output of self.affine
.
It is assumed that
time_stride == 1
,conv_stride == 1
or
time_stride == 0
,conv_stride == 3
So c:-c:c
is equivalent to 1:-1:c
when time_stride==1
and conv_stride == 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it should be called time_stride here. Perhaps in the original Kaldi code it wasn't super clear but when implemented as convolution it gets very confusing. Better to make (stride, kernel_size) the parameters and have them be (1, 3), (1, 3), ... (3, 3), (1, 1), (1, 3), (1, 3) ...
In any case, please revert other aspects of the implementation to more similar to the way it was before and start doing experiments with that. I don't see much point starting from such a strange starting point. (i.e. the way the code is right now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
I also find them confusing but I wrote it this way to follow the naming style in Kaldi.
I'll change them now.
stride=conv_stride) | ||
|
||
# batchnorm requires [N, C, T] | ||
self.batchnorm = nn.BatchNorm1d(num_features=dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a closer match to what Kaldi's system is doing if you were to add affine=False
wherever you use batchnorm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be addressed in the next pullrequest.
We are trying to replace TDNN with TDNNF in kaldi pybind training with PyTorch.