Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in MAS #24

Open
patriotyk opened this issue Feb 6, 2024 · 23 comments
Open

Crash in MAS #24

patriotyk opened this issue Feb 6, 2024 · 23 comments

Comments

@patriotyk
Copy link

We are experiencing a strange issue. With one our big dataset (about 300 hours) MAS is randomly crashes. Core dump shows following line:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007feaf9884576 in __pyx_f_5pflow_5utils_15monotonic_align_4core_maximum_path_each (__pyx_v_path=..., __pyx_v_value=..., __pyx_v_path=..., __pyx_v_value=..., __pyx_optional_args=0x0, __pyx_v_t_x=289, __pyx_v_t_y=279)
    at pflow/utils/monotonic_align/core.c:17615
17615       if (__pyx_t_7) {

We have tried everything but nothing did help.
The only thing that helped was replacing MAS with AlignerNet but there was another issue - crash at inference, maybe synthesis method requires some changes too?

I have successfully trained pflowttss on single speaker dataset which is subset of this bigger dataset and it sounds great. Demo is here - https://tts.patriotyk.name

Also I have built and pushed to registry docker image which can be used to reproduce this issue, just need to pull and run it. I can share url in private message if you need it.

@p0p4k
Copy link
Owner

p0p4k commented Feb 7, 2024

For MAS debug, I am not good at C++, yet. I can suggest one thing, just run MAS (encoder + spectrogram), everything else can be deleted in batch. At some batch it will fail, open that batch and run one sample at a time, you will the sample which gives problem.

@patriotyk
Copy link
Author

Thank you for your fast answer.
We have found files that causes crash, but they looks normal, after removing that file we are able to run train but then it crashes on another one. Also it may not crash few epochs, so all files have been successfully used for train, but then it crashes. Maybe it would be easier to switch to AlignerNet? I have uncommented code that you commented in constructor and in forward method and commented call to MAS. This works fine, it trains without crashes but inference crashes. Maybe you could help us with this? Do I need to change something in synthesise method to work it properly?

@p0p4k
Copy link
Owner

p0p4k commented Feb 7, 2024

Sure, I'll fix AlignerNet synthesis in this week. What is the error during inference?

@patriotyk
Copy link
Author

Oh you have edited you answer. It was a crash. If you need more info I can try to run it again and will tell you. But I think I may made some mistake. Maybe you can push changes somewhere in separate branch and I will compare.

@p0p4k
Copy link
Owner

p0p4k commented Feb 10, 2024

Someone else tried aligner net and it worked ok for them. So I am not sure how to debug without error, if it's crash, maybe dataset issue? Does AlignerNet train on the small subset?

@Tera2Space
Copy link

I have uncommented code that you commented in constructor and in forward method and commented call to MAS. This works fine, it trains without crashes but inference crashes.

But AlignerNet isn't used during inference

@p0p4k
Copy link
Owner

p0p4k commented Feb 10, 2024

True. If it crashes during training, I think it is the dataset issue.

@patriotyk
Copy link
Author

No it doesn't crash during the training. I got crash after I loaded trained checkpoint on synthesis method.

@p0p4k
Copy link
Owner

p0p4k commented Feb 10, 2024

Send some random input to the duration predictor, does it predict something?

@Tera2Space
Copy link

No it doesn't crash during the training. I got crash after I loaded trained checkpoint on synthesis method.

Is the error something like "out of memory"?

@patriotyk
Copy link
Author

No it doesn't crash during the training. I got crash after I loaded trained checkpoint on synthesis method.

Is the error something like "out of memory"?

This evening I will try to run it again and tell you more details.

@Tera2Space
Copy link

No it doesn't crash during the training. I got crash after I loaded trained checkpoint on synthesis method.

Is the error something like "out of memory"?

This evening I will try to run it again and tell you more details.

If you want we can talk in telegram (https://t.me/TeraSpace) I speak Ukrainian and Russian.

@patriotyk
Copy link
Author

@p0p4k I have tried again and yes, on inference I got out of memory error. Same as @Tera2Space mentioned.

@p0p4k
Copy link
Owner

p0p4k commented Feb 14, 2024

So, maybe try a small sentence inference? Does that work? If it does, then just memory issue and not code ksse.

@patriotyk
Copy link
Author

It is small sentence. No it doesn't work. It doing something very long than crashes. https://drive.google.com/file/d/1WaIYiloaf3oDVtkWb5LH8YN0XWW2KXbR/view?usp=drivesdk

@Tera2Space
Copy link

Yep, same, I believe that AlignerNet didn't converged so duration predictor learn wrong alignments so at inference audio become very long and cause out of memory.

@Tera2Space
Copy link

So, maybe try a small sentence inference? Does that work? If it does, then just memory issue and not code ksse.

1500 gibs of vram.... I think it's code issue because it happens at evaluation at training, when with MAS it works fine.

@p0p4k
Copy link
Owner

p0p4k commented Feb 15, 2024

Give the model some random durations instead of using the duration predictor and try to see the output. (One duration integer per phoneme)

@patriotyk
Copy link
Author

Sorry, but I don't know how to do that.

@Tera2Space
Copy link

I will try later to clamp out of aligner.

@Tera2Space
Copy link

Tera2Space commented Mar 7, 2024

Now I’m wondering if the problem might be that we use text encoder outputs as input to alignernet, which(text encoder outputs) are passed through convolution (to get dimensions like mel frame)? Because while I was testing pitch predictor it didn't work when conditioned on output of text encoder, but when i tried to use x_emb directly it worked.

I will test and if work I will create PR

@p0p4k
Copy link
Owner

p0p4k commented Mar 7, 2024

Now I’m wondering if the problem might be that we use text encoder outputs as input to alignernet, which(text encoder outputs) are passed through convolution (to get dimensions like mel frame)? Because while I was testing pitch predictor it didn't work when conditioned on output of text encoder, but when i tried to use x_emb directly it worked.

I will test and if work I will create PR

Very interesting 🤔

@lexkoro
Copy link

lexkoro commented May 3, 2024

also here the wild guess, do you mind trying to use the numpy version of maximum_path search:
https://github.com/coqui-ai/TTS/blob/dbf1a08a0d4e47fdad6172e433eeb34bc6b13b4e/TTS/tts/utils/helpers.py#L197

Long time ago I also had problems with seg faults, running the training with gdb showed that it was related to maximum_path and using the numpy version fixed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants