Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

window10上gpu跑自定义数据集发生报错 #14

Open
Arcofcosmos opened this issue Aug 28, 2021 · 0 comments
Open

window10上gpu跑自定义数据集发生报错 #14

Arcofcosmos opened this issue Aug 28, 2021 · 0 comments

Comments

@Arcofcosmos
Copy link

按照楼主的方法,用自己的数据集训练模型发生报错,发生这样的错误
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [11,0,0], thread: [23,0,0] Assertion input_val >= zero && input_val <= one failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [11,0,0], thread: [24,0,0] Assertion input_val >= zero && input_val <= one failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [11,0,0], thread: [25,0,0] Assertion input_val >= zero && input_val <= one failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [11,0,0], thread: [26,0,0] Assertion input_val >= zero && input_val <= one failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [11,0,0], thread: [27,0,0] Assertion input_val >= zero && input_val <= one failed.
C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [11,0,0], thread: [28,0,0] Assertion input_val >= zero && input_val <= one failed.
RuntimeError: CUDA error: device-side assert triggered

网上搜查了资料,有的说数据集标签索引问题,但我修改后发现应该不是这个问题,并且,我关闭gpu使用cpu跑能够正常运行。
我将错误定位到这里:
Traceback (most recent call last):
File "C:\Users\TuZhou\Desktop\yolox_new\yolox-pytorch-main\models\losses\yolox_loss.py", line 147, in get_losses
cls_preds, bbox_preds, obj_preds, targets, imgs,
File "D:\user\Software\Anaconda\lib\site-packages\torch\autograd\grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "C:\Users\TuZhou\Desktop\yolox_new\yolox-pytorch-main\models\losses\yolox_loss.py", line 284, in get_assignments
num_gt, fg_mask)
File "C:\Users\TuZhou\Desktop\yolox_new\yolox-pytorch-main\models\losses\yolox_loss.py", line 357, in dynamic_k_matching
print("ious_in_boxes_matrix", ious_in_boxes_matrix)
File "D:\user\Software\Anaconda\lib\site-packages\torch\tensor.py", line 179, in repr
return torch._tensor_str._str(self)
File "D:\user\Software\Anaconda\lib\site-packages\torch_tensor_str.py", line 372, in _str
return _str_intern(self)
File "D:\user\Software\Anaconda\lib\site-packages\torch_tensor_str.py", line 352, in _str_intern
tensor_str = _tensor_str(self, indent)
File "D:\user\Software\Anaconda\lib\site-packages\torch_tensor_str.py", line 241, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "D:\user\Software\Anaconda\lib\site-packages\torch_tensor_str.py", line 89, in init
nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
似乎是出现nan这样的数据,匪夷所思,使用cpu却是正常的。。。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant