Problems encountered when using multiple Gpus for training #95

123456789live · 2021-12-03T13:32:57Z

Dear author, I encountered this problem when using two gpu. How to solve this problem?
(zq) omnisky@node01:/data01/zq/CaDDN/tools$ python -m torch.distributed.launch --nproc_per_node=2 train.py --launcher pytorch --batch_size 2 --cfg_file cfgs/kitti_models/CaDDN.yaml

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

Traceback (most recent call last):
File "train.py", line 197, in
main()
File "train.py", line 72, in main
assert args.batch_size % total_gpus == 0, 'Batch size should match the number of gpus'
AssertionError: Batch size should match the number of gpus
Traceback (most recent call last):
File "train.py", line 197, in
main()
File "train.py", line 72, in main
assert args.batch_size % total_gpus == 0, 'Batch size should match the number of gpus'
AssertionError: Batch size should match the number of gpus
Traceback (most recent call last):
File "/usr/local/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/omnisky/zq/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in
main()
File "/home/omnisky/zq/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/omnisky/zq/bin/python3', '-u', 'train.py', '--local_rank=1', '--launcher', 'pytorch', '--batch_size', '2', '--cfg_file', 'cfgs/kitti_models/CaDDN.yaml']' returned non-zero exit status 1.
(zq) omnisky@node01:/data01/zq/CaDDN/tools$ python -m torch.distributed.launch --nproc_per_node=2 train.py --launcher pytorch --batch_size 2 --cfg_file cfgs/kitti_models/CaDDN.yaml^C

fgqile · 2022-01-19T12:04:46Z

i MET IT TOO

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems encountered when using multiple Gpus for training #95

Problems encountered when using multiple Gpus for training #95

123456789live commented Dec 3, 2021

fgqile commented Jan 19, 2022

Problems encountered when using multiple Gpus for training #95

Problems encountered when using multiple Gpus for training #95

Comments

123456789live commented Dec 3, 2021

fgqile commented Jan 19, 2022