Monailabel error with Cuda and SegResNet. #1636
Unanswered
keyurradia
asked this question in
Q&A
Replies: 1 comment
-
Hi @keyurradia, Thanks for opening this discussion. How many labels are you trying to segment? It seems this error comes from the difference in the number of labels when using the pre-trained model. Have you disabled pre-trained model usage, here: If not, please delete all files within the radiology/model folder and disable the pre-trained model usage before triggering the training. Let us know, |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I am trying to train segmentation model with custom labels. I made changes in the config of model according to instruction.
I am having 16gb GPU. I have installed cuda and pytorch according to compatibility.
The error is erupting and disabling "cuda"
and the other error is "RuntimeError: Error(s) in loading state_dict for SegResNet"
Here is my code in conda. can some please help me where I am doing wrong?
[2024-02-09 10:34:49,391] [30420] [MainThread] [INFO] (monailabel.endpoints.activelearning:44) - Active Learning Request: {'strategy': 'first', 'client_id': 'user-xyz'}
[2024-02-09 10:34:49,392] [30420] [MainThread] [INFO] (monailabel.tasks.activelearning.first:38) - First: Selected Image: hepaticvessel_004
[2024-02-09 10:34:49,428] [30420] [MainThread] [INFO] (monailabel.endpoints.activelearning:60) - Next sample: {'id': 'hepaticvessel_004', 'path': 'C:\Users\keyur\datasets\Task08_HepaticVessel\imagesTr\hepaticvessel_004.nii.gz', 'ts': 1707328026, 'name': 'hepaticvessel_004.nii.gz'}
[2024-02-09 10:36:49,455] [30420] [MainThread] [INFO] (monailabel.endpoints.datastore:101) - Saving Label for hepaticvessel_004 for tag: final by admin
[2024-02-09 10:36:49,458] [30420] [MainThread] [INFO] (monailabel.endpoints.datastore:112) - Save Label params: {"label_info": [{"name": "gallbladder", "idx": 1}, {"name": "liver", "idx": 2}, {"name": "inferior vena cava", "idx": 3}, {"name": "portal vein and splenic vein", "idx": 4}, {"name": "vessels", "idx": 5}, {"name": "lesion", "idx": 6}], "client_id": "user-xyz"}
[2024-02-09 10:36:49,459] [30420] [MainThread] [INFO] (monailabel.datastore.local:486) - Saving Label for Image: hepaticvessel_004; Tag: final; Info: {'label_info': [{'name': 'gallbladder', 'idx': 1}, {'name': 'liver', 'idx': 2}, {'name': 'inferior vena cava', 'idx': 3}, {'name': 'portal vein and splenic vein', 'idx': 4}, {'name': 'vessels', 'idx': 5}, {'name': 'lesion', 'idx': 6}], 'client_id': 'user-xyz'}
[2024-02-09 10:36:49,460] [30420] [MainThread] [INFO] (monailabel.datastore.local:494) - Adding Label: hepaticvessel_004 => final => C:\Users\keyur\AppData\Local\Temp\tmpmuo47wzs.nii.gz
[2024-02-09 10:36:49,468] [30420] [MainThread] [INFO] (monailabel.datastore.local:510) - Label Info: {'label_info': [{'name': 'gallbladder', 'idx': 1}, {'name': 'liver', 'idx': 2}, {'name': 'inferior vena cava', 'idx': 3}, {'name': 'portal vein and splenic vein', 'idx': 4}, {'name': 'vessels', 'idx': 5}, {'name': 'lesion', 'idx': 6}], 'client_id': 'user-xyz', 'ts': 1707471409, 'name': 'hepaticvessel_004.nii.gz'}
[2024-02-09 10:36:49,491] [30420] [MainThread] [INFO] (monailabel.interfaces.app:493) - New label saved for: hepaticvessel_004 => hepaticvessel_004
[2024-02-09 10:36:49,748] [30420] [Thread-1] [INFO] (monailabel.datastore.local:577) - Invalidate count: 0
[2024-02-09 10:37:04,944] [30420] [MainThread] [INFO] (monailabel.utils.async_tasks.task:41) - Train request: {'model': 'segmentation', 'name': 'train_01', 'pretrained': True, 'device': 'cpu', 'max_epochs': 50, 'early_stop_patience': -1, 'val_split': 0.2, 'train_batch_size': 1, 'val_batch_size': 1, 'multi_gpu': True, 'gpus': 'all', 'dataset': 'SmartCacheDataset', 'dataloader': 'ThreadDataLoader', 'tracking': 'mlflow', 'tracking_uri': '', 'tracking_experiment_name': '', 'client_id': 'user-xyz'}
[2024-02-09 10:37:04,946] [30420] [ThreadPoolExecutor-1_0] [INFO] (monailabel.utils.async_tasks.utils:49) - Before:: C:\Users\keyur\anaconda3\envs;;C:\Users\keyur\apps\radiology
[2024-02-09 10:37:04,947] [30420] [ThreadPoolExecutor-1_0] [INFO] (monailabel.utils.async_tasks.utils:53) - After:: C:\Users\keyur\anaconda3\envs;;C:\Users\keyur\apps\radiology
[2024-02-09 10:37:04,947] [30420] [ThreadPoolExecutor-1_0] [INFO] (monailabel.utils.async_tasks.utils:65) - COMMAND:: C:\Users\keyur\anaconda3\envs\monailabel-env\python.exe -m monailabel.interfaces.utils.app -m train -r {"model":"segmentation","name":"train_01","pretrained":true,"device":"cpu","max_epochs":50,"early_stop_patience":-1,"val_split":0.2,"train_batch_size":1,"val_batch_size":1,"multi_gpu":true,"gpus":"all","dataset":"SmartCacheDataset","dataloader":"ThreadDataLoader","tracking":"mlflow","tracking_uri":"","tracking_experiment_name":"","client_id":"user-xyz"}
[2024-02-09 10:37:05,218] [37016] [MainThread] [INFO] (main:37) - Initializing App from: C:\Users\keyur\apps\radiology; studies: C:\Users\keyur\datasets\Task08_HepaticVessel\imagesTr; conf: {'models': 'segmentation'}
[2024-02-09 10:37:09,348] [37016] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for MONAILabelApp Found: <class 'main.MyApp'>
[2024-02-09 10:37:09,354] [37016] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.deepedit.DeepEdit'>
[2024-02-09 10:37:09,355] [37016] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.deepgrow_2d.Deepgrow2D'>
[2024-02-09 10:37:09,355] [37016] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.deepgrow_3d.Deepgrow3D'>
[2024-02-09 10:37:09,355] [37016] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.localization_spine.LocalizationSpine'>
[2024-02-09 10:37:09,356] [37016] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.localization_vertebra.LocalizationVertebra'>
[2024-02-09 10:37:09,356] [37016] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.segmentation.Segmentation'>
[2024-02-09 10:37:09,357] [37016] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.segmentation_spleen.SegmentationSpleen'>
[2024-02-09 10:37:09,357] [37016] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.segmentation_vertebra.SegmentationVertebra'>
[2024-02-09 10:37:09,357] [37016] [MainThread] [INFO] (main:93) - +++ Adding Model: segmentation => lib.configs.segmentation.Segmentation
[2024-02-09 10:37:09,414] [37016] [MainThread] [INFO] (main:96) - +++ Using Models: ['segmentation']
[2024-02-09 10:37:09,414] [37016] [MainThread] [INFO] (monailabel.interfaces.app:135) - Init Datastore for: C:\Users\keyur\datasets\Task08_HepaticVessel\imagesTr
[2024-02-09 10:37:09,414] [37016] [MainThread] [INFO] (monailabel.datastore.local:130) - Auto Reload: False; Extensions: ['.nii.gz', '.nii', '.nrrd', '.jpg', '.png', '.tif', '.svs', '.xml']
[2024-02-09 10:37:09,471] [37016] [MainThread] [INFO] (monailabel.datastore.local:577) - Invalidate count: 0
[2024-02-09 10:37:09,471] [37016] [MainThread] [INFO] (main:126) - +++ Adding Inferer:: segmentation => <lib.infers.segmentation.Segmentation object at 0x0000015FB794B1D0>
[2024-02-09 10:37:09,471] [37016] [MainThread] [INFO] (main:191) - {'segmentation': <lib.infers.segmentation.Segmentation object at 0x0000015FB794B1D0>, 'Histogram+GraphCut': <monailabel.scribbles.infer.HistogramBasedGraphCut object at 0x0000015FB7BCFA90>, 'GMM+GraphCut': <monailabel.scribbles.infer.GMMBasedGraphCut object at 0x0000015FB7B94250>}
[2024-02-09 10:37:09,471] [37016] [MainThread] [INFO] (main:206) - +++ Adding Trainer:: segmentation => <lib.trainers.segmentation.Segmentation object at 0x0000015FB7B95E10>
[2024-02-09 10:37:09,471] [37016] [MainThread] [INFO] (monailabel.utils.sessions:51) - Session Path: C:\Users\keyur.cache\monailabel\sessions
[2024-02-09 10:37:09,471] [37016] [MainThread] [INFO] (monailabel.utils.sessions:52) - Session Expiry (max): 3600
[2024-02-09 10:37:09,471] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:432) - Train Request (input): {'model': 'segmentation', 'name': 'train_01', 'pretrained': True, 'device': 'cpu', 'max_epochs': 50, 'early_stop_patience': -1, 'val_split': 0.2, 'train_batch_size': 1, 'val_batch_size': 1, 'multi_gpu': True, 'gpus': 'all', 'dataset': 'SmartCacheDataset', 'dataloader': 'ThreadDataLoader', 'tracking': 'mlflow', 'tracking_uri': '', 'tracking_experiment_name': '', 'client_id': 'user-xyz', 'local_rank': 0}
[2024-02-09 10:37:09,471] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:445) - CUDA_VISIBLE_DEVICES: None
[2024-02-09 10:37:09,472] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:450) - Distributed/Multi GPU is limited
[2024-02-09 10:37:09,472] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:465) - Distributed Training = FALSE
[2024-02-09 10:37:09,472] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:492) - 0 - Train Request (final): {'name': 'train_01', 'pretrained': True, 'device': 'cpu', 'max_epochs': 50, 'early_stop_patience': -1, 'val_split': 0.2, 'train_batch_size': 1, 'val_batch_size': 1, 'multi_gpu': False, 'gpus': 'all', 'dataset': 'SmartCacheDataset', 'dataloader': 'ThreadDataLoader', 'tracking': 'mlflow', 'tracking_uri': '', 'tracking_experiment_name': '', 'model': 'segmentation', 'client_id': 'user-xyz', 'local_rank': 0, 'run_id': '20240209_103709'}
[2024-02-09 10:37:09,472] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:625) - 0 - Using Device: cpu; IDX: None
[2024-02-09 10:37:09,472] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:518) - Run/Output Path: C:\Users\keyur\apps\radiology\model\segmentation\train_01
[2024-02-09 10:37:09,472] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:534) - Tracking: mlflow
[2024-02-09 10:37:09,472] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:535) - Tracking URI: file:///C:/Users/keyur/apps/radiology/model/segmentation/train_01/mlruns;
[2024-02-09 10:37:09,472] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:536) - Tracking Experiment Name: segmentation; Run Name: run_20240209_103709
[2024-02-09 10:37:09,472] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:410) - Total Records for Training: 2
[2024-02-09 10:37:09,472] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:411) - Total Records for Validation: 1
monai.transforms.croppad.dictionary CropForegroundd.init:allow_smaller: Current default value of argument
allow_smaller=True
has been deprecated since version 1.2. It will be changed toallow_smaller=False
in version 1.5.Loading dataset: 0%| | 0/1 [00:00<?, ?it/s]
Loading dataset: 100%|##########| 1/1 [00:01<00:00, 1.85s/it]
Loading dataset: 100%|##########| 1/1 [00:01<00:00, 1.85s/it]
cache_num is greater or equal than dataset length, fall back to regular monai.data.CacheDataset.
[2024-02-09 10:37:11,341] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:328) - 0 - Records for Validation: 1
[2024-02-09 10:37:11,345] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:318) - 0 - Adding Validation to run every '1' interval
[2024-02-09 10:37:11,347] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:713) - 0 - Load Path C:\Users\keyur\apps\radiology\model\segmentation\train_01\model.pt
Loading dataset: 0%| | 0/2 [00:00<?, ?it/s]
Loading dataset: 50%|##### | 1/2 [00:02<00:02, 2.34s/it]
Loading dataset: 100%|##########| 2/2 [00:04<00:00, 2.24s/it]
Loading dataset: 100%|##########| 2/2 [00:04<00:00, 2.26s/it]
[2024-02-09 10:37:15,868] [37016] [MainThread] [INFO] (monailabel.tasks.train.basic_train:264) - 0 - Records for Training: 2
torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.
[2024-02-09 10:37:15,872] [37016] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:876) - Engine run resuming from iteration 0, epoch 0 until 50 epochs
[2024-02-09 10:37:15,938] [37016] [MainThread] [ERROR] (ignite.engine.engine.SupervisedTrainer:992) - Engine run is terminating due to exception: Error(s) in loading state_dict for SegResNet:
size mismatch for conv_final.2.conv.weight: copying a param with shape torch.Size([3, 32, 1, 1, 1]) from checkpoint, the shape in current model is torch.Size([7, 32, 1, 1, 1]).
size mismatch for conv_final.2.conv.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([7]).
2024-02-09 10:37:15,938 - ERROR - Exception: Error(s) in loading state_dict for SegResNet:
size mismatch for conv_final.2.conv.weight: copying a param with shape torch.Size([3, 32, 1, 1, 1]) from checkpoint, the shape in current model is torch.Size([7, 32, 1, 1, 1]).
size mismatch for conv_final.2.conv.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([7]).
Traceback (most recent call last):
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\engine\engine.py", line 946, in _internal_run_as_gen
self._fire_event(Events.STARTED)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event
func(*first, *(event_args + others), **kwargs)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\monai\handlers\checkpoint_loader.py", line 147, in call
Checkpoint.load_objects(to_load=self.load_dict, checkpoint=checkpoint, strict=self.strict)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\handlers\checkpoint.py", line 635, in load_objects
_load_object(obj, checkpoint_obj[k])
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\handlers\checkpoint.py", line 620, in _load_object
obj.load_state_dict(chkpt_obj, strict=is_state_dict_strict)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\torch\nn\modules\module.py", line 2153, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for SegResNet:
size mismatch for conv_final.2.conv.weight: copying a param with shape torch.Size([3, 32, 1, 1, 1]) from checkpoint, the shape in current model is torch.Size([7, 32, 1, 1, 1]).
size mismatch for conv_final.2.conv.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([7]).
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\monailabel\interfaces\utils\app.py", line 128, in
run_main()
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\monailabel\interfaces\utils\app.py", line 113, in run_main
result = a.train(request)
^^^^^^^^^^^^^^^^
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\monailabel\interfaces\app.py", line 423, in train
result = task(request, self.datastore())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\monailabel\tasks\train\basic_train.py", line 466, in call
res = self.train(0, world_size, req, datalist)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\monailabel\tasks\train\basic_train.py", line 555, in train
context.trainer.run()
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\monai\engines\trainer.py", line 53, in run
super().run()
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\monai\engines\workflow.py", line 283, in run
super().run(data=self.data_loader, max_epochs=self.state.max_epochs)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\engine\engine.py", line 892, in run
return self._internal_run()
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\engine\engine.py", line 935, in _internal_run
return next(self._internal_run_generator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\engine\engine.py", line 993, in _internal_run_as_gen
self._handle_exception(e)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\engine\engine.py", line 636, in _handle_exception
self._fire_event(Events.EXCEPTION_RAISED, e)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event
func(*first, *(event_args + others), **kwargs)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\monai\handlers\stats_handler.py", line 202, in exception_raised raise e
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\engine\engine.py", line 946, in _internal_run_as_gen
self._fire_event(Events.STARTED)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event
func(*first, *(event_args + others), **kwargs)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\monai\handlers\checkpoint_loader.py", line 147, in call
Checkpoint.load_objects(to_load=self.load_dict, checkpoint=checkpoint, strict=self.strict)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\handlers\checkpoint.py", line 635, in load_objects
_load_object(obj, checkpoint_obj[k])
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\ignite\handlers\checkpoint.py", line 620, in _load_object
obj.load_state_dict(chkpt_obj, strict=is_state_dict_strict)
File "C:\Users\keyur\anaconda3\envs\monailabel-env\Lib\site-packages\torch\nn\modules\module.py", line 2153, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for SegResNet:
size mismatch for conv_final.2.conv.weight: copying a param with shape torch.Size([3, 32, 1, 1, 1]) from checkpoint, the shape in current model is torch.Size([7, 32, 1, 1, 1]).
size mismatch for conv_final.2.conv.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([7]).
[2024-02-09 10:37:16,472] [30420] [ThreadPoolExecutor-1_0] [INFO] (monailabel.utils.async_tasks.utils:83) - Return code: 1
Beta Was this translation helpful? Give feedback.
All reactions