-
Notifications
You must be signed in to change notification settings - Fork 533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When I tried to start training, I got an error:RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). #188
Comments
Hello, the following solution solved the problem: |
Hello, thank you for your help. |
Did you move the steps to be after the backward? |
Hello, and I apologize for the delay. This was the issue for me; I believe you should also check if dis_loss and gen_loss are not None before using them. |
You can try it this way, although I ran up this way, but the results are not ideal, I do not know what is the reason, look forward to our follow-up communication. |
Thanks for your help, I have solved the problem. |
In which mode did you not retrain the model well? |
Could you please tell me how did you solve this problem?Thank you very much! |
Hello,the backward of both phase networks needs to be modified. I only changed the backward of phase two networks earlier.This might help you! |
Sorry to see your reply now,My problem occurs in model = 1,Although, 540,000 rounds of training have been performed, there are still many ambiguities. |
Thanks for your reply, I only use the first stage of EdgeConnect, so I don't think it's a big problem if the second stage is not modified.
Thanks for your reply, I only use the first stage of EdgeConnect, so I don't think it's a big problem if the second stage is not modified. |
hello!I change the models.py as follow:
And I use the 3 stage of EdgeConnect run 20 epoch by pytorch 1.7. The outputs have no problem, I see. So, I means you can try it. |
|
Hardly to say why this problem is happening. I used Stage 1 and Stage 3 again. And this network could generate not bad results. |
Thank you very much for your contribution. Your article helps me a lot.
As mentioned in the title, I encountered an error at the beginning of my training.The detailed error information is as follows:
Traceback (most recent call last):
File "E:/our code/edge-connect-master/train.py", line 2, in
main(mode=1)
File "E:\our code\edge-connect-master\main.py", line 56, in main
model.train()
File "E:\our code\edge-connect-master\src\edge_connect.py", line 178, in train
self.inpaint_model.backward(i_gen_loss, i_dis_loss)
File "E:\our code\edge-connect-master\src\models.py", line 259, in backward
gen_loss.backward()
File "D:\Anaconda\envs\CTSDG\lib\site-packages\torch_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "D:\Anaconda\envs\CTSDG\lib\site-packages\torch\autograd_init_.py", line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Have you ever encountered this mistake in training? I would appreciate it if you could tell me how to solve this problem!
The text was updated successfully, but these errors were encountered: