When I tried to start training, I got an error：RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). #188

TT-mouse · 2023-03-25T09:14:55Z

Thank you very much for your contribution. Your article helps me a lot.
As mentioned in the title, I encountered an error at the beginning of my training.The detailed error information is as follows:
Traceback (most recent call last):
File "E:/our code/edge-connect-master/train.py", line 2, in
main(mode=1)
File "E:\our code\edge-connect-master\main.py", line 56, in main
model.train()
File "E:\our code\edge-connect-master\src\edge_connect.py", line 178, in train
self.inpaint_model.backward(i_gen_loss, i_dis_loss)
File "E:\our code\edge-connect-master\src\models.py", line 259, in backward
gen_loss.backward()
File "D:\Anaconda\envs\CTSDG\lib\site-packages\torch_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "D:\Anaconda\envs\CTSDG\lib\site-packages\torch\autograd_init_.py", line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Have you ever encountered this mistake in training? I would appreciate it if you could tell me how to solve this problem!

talyeho · 2023-03-25T20:45:24Z

Hello, the following solution solved the problem:
https://stackoverflow.com/questions/71793678/i-am-running-into-a-gradient-computation-inplace-error

TT-mouse · 2023-03-26T02:18:30Z

Hello, thank you for your help.
I tried to modify the code according to the link you provided, but the same error still occurs.I've seen people say it's a pytorch version issue, but I want to fix that by not demoting the pytorch version.Can you help me?

talyeho · 2023-03-26T06:26:46Z

Did you move the steps to be after the backward?

TT-mouse · 2023-03-26T06:43:51Z

Did you move the steps to be after the backward?

Yes,I changed the code in the backpropagation as follows, but it didn't work.

talyeho · 2023-03-29T18:40:29Z

Hello, and I apologize for the delay. This was the issue for me; I believe you should also check if dis_loss and gen_loss are not None before using them.

Ghost0405 · 2023-03-31T01:18:50Z

def backward(self, gen_loss=None, dis_loss=None):
    if gen_loss is not None:
        gen_loss.backward()
    self.gen_optimizer.step()

    if dis_loss is not None:
        dis_loss.backward()
    self.dis_optimizer.step()

You can try it this way, although I ran up this way, but the results are not ideal, I do not know what is the reason, look forward to our follow-up communication.

TT-mouse · 2023-04-02T08:20:50Z

Hello, and I apologize for the delay. This was the issue for me; I believe you should also check if dis_loss and gen_loss are not None before using them.

Thanks for your help, I have solved the problem.

TT-mouse · 2023-04-02T08:30:11Z

def backward(self, gen_loss=None, dis_loss=None):
    if gen_loss is not None:
        gen_loss.backward()
    self.gen_optimizer.step()

    if dis_loss is not None:
        dis_loss.backward()
    self.dis_optimizer.step()
You can try it this way, although I ran up this way, but the results are not ideal, I do not know what is the reason, look forward to our follow-up communication.

In which mode did you not retrain the model well?

wizaaaard · 2023-04-16T15:12:24Z

Hello, and I apologize for the delay. This was the issue for me; I believe you should also check if dis_loss and gen_loss are not None before using them.

Thanks for your help, I have solved the problem.

Could you please tell me how did you solve this problem?Thank you very much!

TT-mouse · 2023-04-17T01:55:52Z

Hello, and I apologize for the delay. This was the issue for me; I believe you should also check if dis_loss and gen_loss are not None before using them.

Thanks for your help, I have solved the problem.

Could you please tell me how did you solve this problem?Thank you very much!

Hello,the backward of both phase networks needs to be modified. I only changed the backward of phase two networks earlier.This might help you!

Ghost0405 · 2023-04-17T11:41:47Z

def backward(self, gen_loss=None, dis_loss=None):
    if gen_loss is not None:
        gen_loss.backward()
    self.gen_optimizer.step()

    if dis_loss is not None:
        dis_loss.backward()
    self.dis_optimizer.step()
大家可以这样试试，虽然我是这样跑上去的，但是结果并不理想，不知道是什么原因，期待我们后续的交流。
您在哪种模式下没有很好地重新训练模型？

def backward(self, gen_loss=None, dis_loss=None):
    if gen_loss is not None:
        gen_loss.backward()
    self.gen_optimizer.step()

    if dis_loss is not None:
        dis_loss.backward()
    self.dis_optimizer.step()
You can try it this way, although I ran up this way, but the results are not ideal, I do not know what is the reason, look forward to our follow-up communication.
In which mode did you not retrain the model well?

Sorry to see your reply now，My problem occurs in model = 1，Although, 540,000 rounds of training have been performed, there are still many ambiguities.

Ghost0405 · 2023-04-17T11:50:00Z

您好，对于延误，我深表歉意。这对我来说是个问题；我相信您还应该在使用之前检查 dis_loss 和 gen_loss 是否不是 None 。

感谢您的帮助，我已经解决了问题。

你能告诉我你是如何解决这个问题的吗？非常感谢！

您好，双相网络的backward需要修改。我之前只改变了第二阶段网络的落后。这可能对你有帮助！

Thanks for your reply, I only use the first stage of EdgeConnect, so I don't think it's a big problem if the second stage is not modified.

Hello, and I apologize for the delay. This was the issue for me; I believe you should also check if dis_loss and gen_loss are not None before using them.

Thanks for your help, I have solved the problem.

Could you please tell me how did you solve this problem?Thank you very much!

Hello,the backward of both phase networks needs to be modified. I only changed the backward of phase two networks earlier.This might help you!

Thanks for your reply, I only use the first stage of EdgeConnect, so I don't think it's a big problem if the second stage is not modified.

manlupanshan · 2023-04-18T08:19:36Z

hello！I change the models.py as follow：

def backward(self, gen_loss=None, dis_loss=None):
    if dis_loss is not None:
        dis_loss.backward(retain_graph=True)
    if gen_loss is not None:
        gen_loss.backward()
    self.dis_optimizer.step()
    self.gen_optimizer.step()

def backward(self, gen_loss=None, dis_loss=None):
    dis_loss.backward(retain_graph=True)
    gen_loss.backward()
    self.dis_optimizer.step()
    self.gen_optimizer.step()

And I use the 3 stage of EdgeConnect run 20 epoch by pytorch 1.7. The outputs have no problem, I see. So, I means you can try it.

CindyzhangKexin · 2023-04-20T12:00:46Z

hello！I change the models.py as follow：
def backward(self, gen_loss=None, dis_loss=None):
    if dis_loss is not None:
        dis_loss.backward(retain_graph=True)
    if gen_loss is not None:
        gen_loss.backward()
    self.dis_optimizer.step()
    self.gen_optimizer.step()

def backward(self, gen_loss=None, dis_loss=None):
    dis_loss.backward(retain_graph=True)
    gen_loss.backward()
    self.dis_optimizer.step()
    self.gen_optimizer.step()
And I use the 3 stage of EdgeConnect run 20 epoch by pytorch 1.7. The outputs have no problem, I see. So, I means you can try it.
Hello, I made the same changes, but the results were very poor and there was a significant difference in data compared to the original text

manlupanshan · 2023-05-03T14:32:44Z

hello！I change the models.py as follow：
def backward(self, gen_loss=None, dis_loss=None):
    if dis_loss is not None:
        dis_loss.backward(retain_graph=True)
    if gen_loss is not None:
        gen_loss.backward()
    self.dis_optimizer.step()
    self.gen_optimizer.step()

def backward(self, gen_loss=None, dis_loss=None):
    dis_loss.backward(retain_graph=True)
    gen_loss.backward()
    self.dis_optimizer.step()
    self.gen_optimizer.step()
And I use the 3 stage of EdgeConnect run 20 epoch by pytorch 1.7. The outputs have no problem, I see. So, I means you can try it.
Hello, I made the same changes, but the results were very poor and there was a significant difference in data compared to the original text

Hardly to say why this problem is happening. I used Stage 1 and Stage 3 again. And this network could generate not bad results.
The backward() should be before optimizer.step(). So I change backward(self, gen_loss=None, dis_loss=None) like this. You may need to rethink why it does not work. Or read the following solution solved the problem:
https://stackoverflow.com/questions/71793678/i-am-running-into-a-gradient-computation-inplace-error
It may help you, bye.

kaelyavel mentioned this issue May 5, 2023

RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. #191

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TT-mouse commented Mar 25, 2023

talyeho commented Mar 25, 2023

TT-mouse commented Mar 26, 2023

talyeho commented Mar 26, 2023

TT-mouse commented Mar 26, 2023

talyeho commented Mar 29, 2023

Ghost0405 commented Mar 31, 2023

TT-mouse commented Apr 2, 2023

TT-mouse commented Apr 2, 2023

wizaaaard commented Apr 16, 2023

TT-mouse commented Apr 17, 2023

Ghost0405 commented Apr 17, 2023

Ghost0405 commented Apr 17, 2023

manlupanshan commented Apr 18, 2023

CindyzhangKexin commented Apr 20, 2023

manlupanshan commented May 3, 2023

Comments

TT-mouse commented Mar 25, 2023

talyeho commented Mar 25, 2023

TT-mouse commented Mar 26, 2023

talyeho commented Mar 26, 2023

TT-mouse commented Mar 26, 2023

talyeho commented Mar 29, 2023

Ghost0405 commented Mar 31, 2023

TT-mouse commented Apr 2, 2023

TT-mouse commented Apr 2, 2023

wizaaaard commented Apr 16, 2023

TT-mouse commented Apr 17, 2023

Ghost0405 commented Apr 17, 2023

Ghost0405 commented Apr 17, 2023

manlupanshan commented Apr 18, 2023

CindyzhangKexin commented Apr 20, 2023

manlupanshan commented May 3, 2023