-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'assert (boxes1[:, 2:] >= boxes1[:, :2]).all()' happened when training #89
Comments
Hi~ |
@PeizeSun -- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/fujitake/anaconda3/envs/protor/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/fujitake/anaconda3/envs/protor/lib/python3.8/site-packages/detectron2/engine/launch.py", line 94, in _distributed_worker
main_func(*args)
File "/home/fujitake/works/SparseR-CNN/projects/SparseRCNN/train_net.py", line 128, in main
return trainer.train()
File "/home/fujitake/anaconda3/envs/protor/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 431, in train
super().train(self.start_iter, self.max_iter)
File "/home/fujitake/anaconda3/envs/protor/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 138, in train
self.run_step()
File "/home/fujitake/anaconda3/envs/protor/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 441, in run_step
self._trainer.run_step()
File "/home/fujitake/anaconda3/envs/protor/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 332, in run_step
loss_dict = self.model(data)
File "/home/fujitake/anaconda3/envs/protor/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fujitake/anaconda3/envs/protor/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/fujitake/anaconda3/envs/protor/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fujitake/works/SparseR-CNN/projects/SparseRCNN/sparsercnn/detector.py", line 143, in forward
loss_dict = self.criterion(output, targets)
File "/home/fujitake/anaconda3/envs/protor/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fujitake/works/SparseR-CNN/projects/SparseRCNN/sparsercnn/loss.py", line 147, in forward
indices = self.matcher(outputs_without_aux, targets)
File "/home/fujitake/anaconda3/envs/protor/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fujitake/anaconda3/envs/protor/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/fujitake/works/SparseR-CNN/projects/SparseRCNN/sparsercnn/loss.py", line 274, in forward
indices = [linear_sum_assignment(c[i]) for i, c in enumerate(C.split(sizes, -1))]
File "/home/fujitake/works/SparseR-CNN/projects/SparseRCNN/sparsercnn/loss.py", line 274, in <listcomp>
indices = [linear_sum_assignment(c[i]) for i, c in enumerate(C.split(sizes, -1))]
File "/home/fujitake/anaconda3/envs/protor/lib/python3.8/site-packages/scipy/optimize/_lsap.py", line 101, in linear_sum_assignment
a, b = _lsap_module.calculate_assignment(cost_matrix.T)
ValueError: matrix contains invalid numeric entries |
Can you print out cost_matrix to see which entry is invalid? |
I am getting the same issue when I try to run Sparse-RCNN with a learning rate of 0.02 for 8 GPUs. Did you find the solution to this problem?. |
Thanks for your great work!!
When I applied AMP training on detectron2, I found some issues with boxes in the training.
Changed
The difference from the original code is here.
Error
Decreasing the learning rate doesn't work for me and this error occurs only mix training.
Is there any good suggestion to solve this problem?
Thank you.
The text was updated successfully, but these errors were encountered: