when evaluate the trained mode on inria dataset, process handling #16

XiaoyuSun-hub · 2020-12-09T21:50:03Z

Hi, I installed the environment in Ubuntu 18.04. I first run the command

python main.py --config configs/config.inria_dataset_osm_aligned.unet_resnet101_pretrained
after training finish
I run
python main.py --config configs/config.inria_dataset_osm_aligned.unet_resnet101_pretrained --mode eval
the program will hanging there with the following output:
INFO: Loading defaults from configs/config.defaults.inria_dataset_osm_aligned.json
INFO: Loading defaults from configs/config.defaults.json
INFO: Loading defaults from configs/loss_params.json
INFO: Loading defaults from configs/optim_params.json
INFO: Loading defaults from configs/polygonize_params.json
INFO: Loading defaults from configs/dataset_params.inria_dataset_osm_aligned.json
INFO: Loading defaults from configs/eval_params.inria_dataset.json
INFO: Loading defaults from configs/eval_params.defaults.json
INFO: Loading defaults from configs/backbone_params.unet_resnet101.json
GPU 0 -> Using data from /gimastorage/Xiaoyu/data/AerialImageDataset
INFO: annotations will be loaded from disk
# --- Start evaluating ---#
Saving eval outputs to /gimastorage/Xiaoyu/data/AerialImageDataset/eval_runs/inria_dataset_osm_aligned.unet_resnet101_pretrained | 2020-12-05 09:55:09
Loading best val checkpoint: /home/sunx/Polygonization-by-Frame-Field-Learning/frame_field_learning/runs/inria_dataset_osm_aligned.unet_resnet101_pretrained | 2020-12-05 09:55:09/checkpoints/checkpoint.best_val.epoch_000001.tar
Eval test: 0%| | 0/34 [00:00<?, ?it/s]Traceback (most recent call last):

It will keep it still, if I stop the process, it gives following errors:
Process SpawnProcess-2:
Traceback (most recent call last):
File "/home/sunx/Polygonization-by-Frame-Field-Learning/main.py", line 387, in
Traceback (most recent call last):
File "/home/sunx/anaconda3/envs/frame_field1/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/sunx/Polygonization-by-Frame-Field-Learning/child_processes.py", line 75, in eval_process
evaluate(gpu, config, shared_dict, barrier, eval_ds, backbone)
File "/home/sunx/Polygonization-by-Frame-Field-Learning/frame_field_learning/evaluate.py", line 62, in evaluate
evaluator.evaluate(split_name, eval_ds)
File "/home/sunx/Polygonization-by-Frame-Field-Learning/frame_field_learning/evaluator.py", line 85, in evaluate
inference.inference_with_patching(self.config, self.model, tile_data)
File "/home/sunx/Polygonization-by-Frame-Field-Learning/frame_field_learning/inference.py", line 79, in inference_with_patching
assert len(tile_data["image"].shape) == 4 and tile_data["image"].shape[0] == 1,
AssertionError: When using inference with patching, tile_data should have a batch size of 1, with image's shape being (1, C, H, W), not torch.Size([6, 3, 725, 725])

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/sunx/anaconda3/envs/frame_field1/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/sunx/anaconda3/envs/frame_field1/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/sunx/anaconda3/envs/frame_field1/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 26, in _wrap
sys.exit(1)
SystemExit: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/sunx/anaconda3/envs/frame_field1/lib/python3.8/multiprocessing/process.py", line 318, in _bootstrap
util._exit_function()
main() File "/home/sunx/anaconda3/envs/frame_field1/lib/python3.8/multiprocessing/util.py", line 334, in _exit_function
p.join()
File "/home/sunx/anaconda3/envs/frame_field1/lib/python3.8/multiprocessing/process.py", line 149, in join
res = self._popen.wait(timeout)

File "/home/sunx/anaconda3/envs/frame_field1/lib/python3.8/multiprocessing/popen_fork.py", line 47, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File "/home/sunx/Polygonization-by-Frame-Field-Learning/main.py", line 381, in main
File "/home/sunx/anaconda3/envs/frame_field1/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
Traceback (most recent call last):
launch_eval(args)
File "/home/sunx/Polygonization-by-Frame-Field-Learning/main.py", line 321, in launch_eval
File "/home/sunx/anaconda3/envs/frame_field1/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/sunx/anaconda3/envs/frame_field1/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/sunx/Polygonization-by-Frame-Field-Learning/lydorn_utils/lydorn_utils/async_utils.py", line 8, in async_func_wrapper
if not out_queue.empty():
File "/home/sunx/anaconda3/envs/frame_field1/lib/python3.8/multiprocessing/queues.py", line 123, in empty
return not self._poll()
File "/home/sunx/anaconda3/envs/frame_field1/lib/python3.8/multiprocessing/connection.py", line 257, in poll
return self._poll(timeout)
File "/home/sunx/anaconda3/envs/frame_field1/lib/python3.8/multiprocessing/connection.py", line 424, in _poll
r = wait([self], timeout)
File "/home/sunx/anaconda3/envs/frame_field1/lib/python3.8/multiprocessing/connection.py", line 924, in wait
selector.register(obj, selectors.EVENT_READ)
File "/home/sunx/anaconda3/envs/frame_field1/lib/python3.8/selectors.py", line 352, in register
key = super().register(fileobj, events, data)
File "/home/sunx/anaconda3/envs/frame_field1/lib/python3.8/selectors.py", line 244, in register
self._fd_to_key[key.fd] = key
KeyboardInterrupt
torch.multiprocessing.spawn(eval_process, nprocs=args.gpus, args=(config, shared_dict, barrier))
File "/home/sunx/anaconda3/envs/frame_field1/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/home/sunx/anaconda3/envs/frame_field1/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 75, in join
ready = multiprocessing.connection.wait(
File "/home/sunx/anaconda3/envs/frame_field1/lib/python3.8/multiprocessing/connection.py", line 930, in wait
ready = selector.select(timeout)
File "/home/sunx/anaconda3/envs/frame_field1/lib/python3.8/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)
KeyboardInterrupt
Eval test: 0%| | 0/34 [13:02<?, ?it/s]

Process finished with exit code 130

I looked at the code of inference file
def inference_with_patching(config, model, tile_data):
*assert len(tile_data["image"].shape) == 4 and tile_data["image"].shape[0] == 1, *
f"When using inference with patching, tile_data should have a batch size of 1, "
f"with image's shape being (1, C, H, W), not {tile_data['image'].shape}"

Here the code assert needs the data to be a certain size which is different from the patch size.

I run the eval command twice, the output above is the second time, so there is no log about the patching process. the first time, it will first patch the test data.

Other things I do is reduce the data size by changing the code inside the inria_aerial.py

CITY_METADATA_DICT = {

"bellingham": {
    "fold": "test",
    "pixelsize": 0.3,
    "numbers": list([2,3]) ,
    "mean": [0.3766195, 0.391402, 0.32659722],
    "std": [0.18134978, 0.16412577, 0.16369793],
},

"austin": {
    "fold": "train",
    "pixelsize": 0.3,
    "numbers": list(range(1, 2)),
    "mean": [0.39584444, 0.40599795, 0.38298687],
    "std": [0.17341954, 0.16856597, 0.16360443],
}

}

The text was updated successfully, but these errors were encountered:

Dingyuan-Chen · 2022-02-26T08:25:01Z

I have met the same question as yours. Have you solved it?

Aria918 · 2022-03-18T02:03:13Z

I also encountered this problem, have you solved it?

kriti115 · 2022-04-22T18:02:55Z

I was able to overcome this issue by specifying the eval batch size while running the main.py like so:

   python main.py --config config_name --mode eval --eval_batch_size 1

But, I faced another problem immediately after which says:

  RuntimeError: The size of tensor a (1024) must match the size of tensor b (299) at non-singleton dimension 3 in inference.py

I tried to change the patch_size in the config file to 299, but that leads to another error. I would be glad if someone could shed some light on this if they have come across this issue.

Thank you.

Shizw695 · 2023-11-02T02:24:31Z

I found a solution. Set "num_workers": 1 in 'config.defaults.json'.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when evaluate the trained mode on inria dataset, process handling #16

when evaluate the trained mode on inria dataset, process handling #16

XiaoyuSun-hub commented Dec 9, 2020 •

edited

Loading

Dingyuan-Chen commented Feb 26, 2022

Aria918 commented Mar 18, 2022

kriti115 commented Apr 22, 2022

Shizw695 commented Nov 2, 2023

when evaluate the trained mode on inria dataset, process handling #16

when evaluate the trained mode on inria dataset, process handling #16

Comments

XiaoyuSun-hub commented Dec 9, 2020 • edited Loading

Dingyuan-Chen commented Feb 26, 2022

Aria918 commented Mar 18, 2022

kriti115 commented Apr 22, 2022

Shizw695 commented Nov 2, 2023

XiaoyuSun-hub commented Dec 9, 2020 •

edited

Loading