Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loss increase and appear nan #12

Open
YangangCao opened this issue Jul 23, 2021 · 9 comments
Open

loss increase and appear nan #12

YangangCao opened this issue Jul 23, 2021 · 9 comments

Comments

@YangangCao
Copy link

Hi, thanks for your excellent work.
I extract feature from speech(pcm, 12GB) and noise(pcm, 9GB), and set count into 10000000. Then, I run run_train.py and get the following output:

WechatIMG444

Can you help me? thanks again!
@jzi040941
Copy link
Owner

Can you tell me which dataset you use for training?

@YangangCao
Copy link
Author

YangangCao commented Jul 26, 2021

Can you tell me which dataset you use for training?

Hi, I do some changes as follow:
Firstly, I add some clean music data into speech since I want to keep the music when denoising.
Secondly, speech and noise are resampled and re-codec from no-original 48k wav(such as: 8k 16k mp3).
Maybe impact training result?

@YangangCao
Copy link
Author

YangangCao commented Jul 27, 2021

I use original 48k speech (concatenate into a pcm, 15GB )and noise(concatenate into a pcm, 7.8GB), set count as 10000000, get increasing loss and nan again.
When I set count as 100000, I get following output:
image
image

seem like also increasing loss per iteration but decreasing per epoch, is it normal? When the count is big, the nan seems inevitable.

@YangangCao
Copy link
Author

Hi , I found the problem, the reason of increasing loss is following:

            # print statistics
            running_loss += loss.item()

            # for testing
            print('[%d, %5d] loss: %.3f' %
                    (epoch + 1, i + 1, running_loss))

Actually, I quite don't understand why you write like this...

The reason of nan is CustomLoss

@jzi040941
Copy link
Owner

Hi @YangangCao
yes I was dumb, I only check for iter=1, epoch=1 that's why I didn't notice this printed loss increasing error on iteration.
I fixed on commit 9de28e0

for nan appear error
Did you check extracted feature(r,g) are 01? if not it will makes nan loss unless you clip it 01.

Thanks

@YangangCao
Copy link
Author

Hi @YangangCao
yes I was dumb, I only check for iter=1, epoch=1 that's why I didn't notice this printed loss increasing error on iteration.
I fixed on commit 9de28e0

for nan appear error
Did you check extracted feature(r,g) are 01? if not it will makes nan loss unless you clip it 01.

Thanks

I have checked the feature extracted from original 48k wav, they are all range from 0 to 1, including float point number, lots of 0 and sparse 1. When I set the count of extracted feature as 1e5, no nan appears( I tried more than one time). However, When I set as 1e6 and 1e7, nan appears again. I am not sure the relationship between count and nan.

@Chen1399
Copy link

Chen1399 commented Sep 7, 2021

Code has error in 'rnn_train.py' so that loss is nan.

rb = targets[:,:,:34]
gb = targets[:,:,34:68]

but in 'denoise.cpp':

fwrite(g, sizeof(float), NB_BANDS, f3);//gain    
fwrite(r, sizeof(float), NB_BANDS, f3);//filtering strength

rb < 0, so that torch.pow(gb, 0.5) is nan

You should change code in 'rnn_train.py' to:

gb = targets[:,:,:34]
rb = targets[:,:,34:68]

@jzi040941
Copy link
Owner

Code has error in 'rnn_train.py' so that loss is nan.

Thanks I've fix in #24

@Chen1399
Copy link

There is a new question for 'loss nan'. The feature of pitch correlation could be 'nan'. The value 'error' could be zero in the file named 'celt_lpc.cpp', which make pitch correlation be nan.
'''
r = -SHL32(rr,3)/error;
'''
You can add a bias to 'error' which can make 'error' not be zero.
'''
r = -SHL32(rr,3)/(error + 0.00001);
'''

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants