Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I tried to start training, I got an error:RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). #188

Open
TT-mouse opened this issue Mar 25, 2023 · 15 comments

Comments

@TT-mouse
Copy link

Thank you very much for your contribution. Your article helps me a lot.
As mentioned in the title, I encountered an error at the beginning of my training.The detailed error information is as follows:
Traceback (most recent call last):
File "E:/our code/edge-connect-master/train.py", line 2, in
main(mode=1)
File "E:\our code\edge-connect-master\main.py", line 56, in main
model.train()
File "E:\our code\edge-connect-master\src\edge_connect.py", line 178, in train
self.inpaint_model.backward(i_gen_loss, i_dis_loss)
File "E:\our code\edge-connect-master\src\models.py", line 259, in backward
gen_loss.backward()
File "D:\Anaconda\envs\CTSDG\lib\site-packages\torch_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "D:\Anaconda\envs\CTSDG\lib\site-packages\torch\autograd_init_.py", line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Have you ever encountered this mistake in training? I would appreciate it if you could tell me how to solve this problem!

@talyeho
Copy link

talyeho commented Mar 25, 2023

Hello, the following solution solved the problem:
https://stackoverflow.com/questions/71793678/i-am-running-into-a-gradient-computation-inplace-error

@TT-mouse
Copy link
Author

Hello, thank you for your help.
I tried to modify the code according to the link you provided, but the same error still occurs.I've seen people say it's a pytorch version issue, but I want to fix that by not demoting the pytorch version.Can you help me?

@talyeho
Copy link

talyeho commented Mar 26, 2023

Did you move the steps to be after the backward?

@TT-mouse
Copy link
Author

Did you move the steps to be after the backward?

Yes,I changed the code in the backpropagation as follows, but it didn't work.
image
image

@talyeho
Copy link

talyeho commented Mar 29, 2023

Hello, and I apologize for the delay. This was the issue for me; I believe you should also check if dis_loss and gen_loss are not None before using them.

@Ghost0405
Copy link

def backward(self, gen_loss=None, dis_loss=None):
    if gen_loss is not None:
        gen_loss.backward()
    self.gen_optimizer.step()

    if dis_loss is not None:
        dis_loss.backward()
    self.dis_optimizer.step()

You can try it this way, although I ran up this way, but the results are not ideal, I do not know what is the reason, look forward to our follow-up communication.

@TT-mouse
Copy link
Author

TT-mouse commented Apr 2, 2023

Hello, and I apologize for the delay. This was the issue for me; I believe you should also check if dis_loss and gen_loss are not None before using them.

Thanks for your help, I have solved the problem.

@TT-mouse
Copy link
Author

TT-mouse commented Apr 2, 2023

def backward(self, gen_loss=None, dis_loss=None):
    if gen_loss is not None:
        gen_loss.backward()
    self.gen_optimizer.step()

    if dis_loss is not None:
        dis_loss.backward()
    self.dis_optimizer.step()

You can try it this way, although I ran up this way, but the results are not ideal, I do not know what is the reason, look forward to our follow-up communication.

In which mode did you not retrain the model well?

@wizaaaard
Copy link

Hello, and I apologize for the delay. This was the issue for me; I believe you should also check if dis_loss and gen_loss are not None before using them.

Thanks for your help, I have solved the problem.

Could you please tell me how did you solve this problem?Thank you very much!

@TT-mouse
Copy link
Author

Hello, and I apologize for the delay. This was the issue for me; I believe you should also check if dis_loss and gen_loss are not None before using them.

Thanks for your help, I have solved the problem.

Could you please tell me how did you solve this problem?Thank you very much!

Hello,the backward of both phase networks needs to be modified. I only changed the backward of phase two networks earlier.This might help you!

@Ghost0405
Copy link

def backward(self, gen_loss=None, dis_loss=None):
    if gen_loss is not None:
        gen_loss.backward()
    self.gen_optimizer.step()

    if dis_loss is not None:
        dis_loss.backward()
    self.dis_optimizer.step()

大家可以这样试试,虽然我是这样跑上去的,但是结果并不理想,不知道是什么原因,期待我们后续的交流。

您在哪种模式下没有很好地重新训练模型?

def backward(self, gen_loss=None, dis_loss=None):
    if gen_loss is not None:
        gen_loss.backward()
    self.gen_optimizer.step()

    if dis_loss is not None:
        dis_loss.backward()
    self.dis_optimizer.step()

You can try it this way, although I ran up this way, but the results are not ideal, I do not know what is the reason, look forward to our follow-up communication.

In which mode did you not retrain the model well?

def backward(self, gen_loss=None, dis_loss=None):
    if gen_loss is not None:
        gen_loss.backward()
    self.gen_optimizer.step()

    if dis_loss is not None:
        dis_loss.backward()
    self.dis_optimizer.step()

You can try it this way, although I ran up this way, but the results are not ideal, I do not know what is the reason, look forward to our follow-up communication.

In which mode did you not retrain the model well?

Sorry to see your reply now,My problem occurs in model = 1,Although, 540,000 rounds of training have been performed, there are still many ambiguities.

@Ghost0405
Copy link

您好,对于延误,我深表歉意。这对我来说是个问题;我相信您还应该在使用之前检查 dis_loss 和 gen_loss 是否不是 None 。

感谢您的帮助,我已经解决了问题。

你能告诉我你是如何解决这个问题的吗?非常感谢!

您好,双相网络的backward需要修改。我之前只改变了第二阶段网络的落后。这可能对你有帮助!

Thanks for your reply, I only use the first stage of EdgeConnect, so I don't think it's a big problem if the second stage is not modified.

Hello, and I apologize for the delay. This was the issue for me; I believe you should also check if dis_loss and gen_loss are not None before using them.

Thanks for your help, I have solved the problem.

Could you please tell me how did you solve this problem?Thank you very much!

Hello,the backward of both phase networks needs to be modified. I only changed the backward of phase two networks earlier.This might help you!

Thanks for your reply, I only use the first stage of EdgeConnect, so I don't think it's a big problem if the second stage is not modified.

@manlupanshan
Copy link

hello!I change the models.py as follow:

def backward(self, gen_loss=None, dis_loss=None):
    if dis_loss is not None:
        dis_loss.backward(retain_graph=True)
    if gen_loss is not None:
        gen_loss.backward()
    self.dis_optimizer.step()
    self.gen_optimizer.step()

def backward(self, gen_loss=None, dis_loss=None):
    dis_loss.backward(retain_graph=True)
    gen_loss.backward()
    self.dis_optimizer.step()
    self.gen_optimizer.step()

And I use the 3 stage of EdgeConnect run 20 epoch by pytorch 1.7. The outputs have no problem, I see. So, I means you can try it.

@CindyzhangKexin
Copy link

hello!I change the models.py as follow:

def backward(self, gen_loss=None, dis_loss=None):
    if dis_loss is not None:
        dis_loss.backward(retain_graph=True)
    if gen_loss is not None:
        gen_loss.backward()
    self.dis_optimizer.step()
    self.gen_optimizer.step()

def backward(self, gen_loss=None, dis_loss=None):
    dis_loss.backward(retain_graph=True)
    gen_loss.backward()
    self.dis_optimizer.step()
    self.gen_optimizer.step()

And I use the 3 stage of EdgeConnect run 20 epoch by pytorch 1.7. The outputs have no problem, I see. So, I means you can try it.
Hello, I made the same changes, but the results were very poor and there was a significant difference in data compared to the original text

@manlupanshan
Copy link

hello!I change the models.py as follow:

def backward(self, gen_loss=None, dis_loss=None):
    if dis_loss is not None:
        dis_loss.backward(retain_graph=True)
    if gen_loss is not None:
        gen_loss.backward()
    self.dis_optimizer.step()
    self.gen_optimizer.step()

def backward(self, gen_loss=None, dis_loss=None):
    dis_loss.backward(retain_graph=True)
    gen_loss.backward()
    self.dis_optimizer.step()
    self.gen_optimizer.step()

And I use the 3 stage of EdgeConnect run 20 epoch by pytorch 1.7. The outputs have no problem, I see. So, I means you can try it.
Hello, I made the same changes, but the results were very poor and there was a significant difference in data compared to the original text

Hardly to say why this problem is happening. I used Stage 1 and Stage 3 again. And this network could generate not bad results.
The backward() should be before optimizer.step(). So I change backward(self, gen_loss=None, dis_loss=None) like this. You may need to rethink why it does not work. Or read the following solution solved the problem:
https://stackoverflow.com/questions/71793678/i-am-running-into-a-gradient-computation-inplace-error
It may help you, bye.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants