-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NAN loss #31
Comments
The training environment used cuda11.8, pytorch 2.1.2, torchvision 2.1.2, torchvision 0.16.2, and DDP training using four NVIDIA A100-SXM4 (80G) cards. |
KSS dataset? 오늘 눈이 많아서 조심하세요! |
Ah, it is not KSS dataset, multispeaker dataset! Maybe there is too much variance, can you try to take small subset of 3-4 speakers and train that first? |
For me I got Nan loss sometimes cause of dataset issue. |
After changing the batch size to 64, the model is not showing any nan_loss. I will continue to monitor and share the results. Additionally, there is a part where the original mel-spectrogram is added to tensorboard with add_image without removing zero-padding. It would be beneficial to add code that removes zero-padding using y_lengths of the batch. Lastly, while it was observed that the GPU usage was at 100% with the vits2 repo by p0p4k, it seems that this repo is not utilizing the GPU as efficiently. I wanted to inquire if there are any ongoing developments related to this. Thank you always for your prompt response. |
About gpu usage, it might be because of dataloader. We might have to investigate that. Keep me updated with samples. Good day! |
Try to disable fp16 and use fp32 |
That is due to the matmul of query and key going overflow with float16. You can find a solution to that problem in Sec. 2.4 in this paper (https://arxiv.org/pdf/2105.13290.pdf) see eq. 4. |
Hello p0p4k,
I've begun training a PFlow Korean model using the code you shared. However, I encountered a nan loss during the training process. I used a publicly available Korean dataset and structured the filelist in a single speaker format with filename|text.
Although the dataset contains over 2000 speakers, it lacks speaker labels, so I trained it using a single-speaker setting. I understand that differences in data and preprocessing methods might lead to various issues, but if you have any insights into the potential causes of nan loss, I would greatly appreciate your advice.
It's snowing heavily in Korea right now. Have a great day.
At first, learning seems to be going well, but then suddenly something goes wrong.
The text was updated successfully, but these errors were encountered: