cnncv.o75104

/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/distributed/launch.py:188: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

  FutureWarning,
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1
INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
INFO:root:Number of representations:  414
INFO:root:Number of representations:  414
INFO:root:--------------------------------------------------
INFO:root:--------------------------------------------------
INFO:root:--------------------------------------------------
INFO:root:--------------------------------------------------
INFO:root:Dataset name:  iontransporters_membraneproteins
INFO:root:Dataset name:  iontransporters_membraneproteins
INFO:root:Dataset type:  balanced
INFO:root:Dataset type:  balanced
INFO:root:Dataset number:  10
INFO:root:Dataset number:  10
INFO:root:Representation type:  finetuned
INFO:root:Representer model:  ESM-1b
INFO:root:Representation type:  finetuned
INFO:root:Precision type:  full
INFO:root:Representer model:  ESM-1b
INFO:root:Precision type:  full
INFO:root:Number of representations:  414
INFO:root:--------------------------------------------------
INFO:root:--------------------------------------------------
INFO:root:Dataset name:  iontransporters_membraneproteins
INFO:root:Dataset type:  balanced
INFO:root:Dataset number:  10
INFO:root:Representation type:  finetuned
INFO:root:Representer model:  ESM-1b
INFO:root:Precision type:  full
INFO:root:Number of representations:  414
INFO:root:--------------------------------------------------
INFO:root:--------------------------------------------------
INFO:root:Dataset name:  iontransporters_membraneproteins
INFO:root:Dataset type:  balanced
INFO:root:Dataset number:  10
INFO:root:Representation type:  finetuned
INFO:root:Representer model:  ESM-1b
INFO:root:Precision type:  full
INFO:root:Number of X_train:  561
INFO:root:Number of Y_train:  561
INFO:root:--------------------------------------------------------------------------------
INFO:root:
Applying 5-fold cross validation to the best model on the whole dataset...
INFO:root:--------------------------------------------------------------------------------
INFO:root:
Fold 1
Traceback (most recent call last):
  File "cnn_cv.py", line 331, in <module>
INFO:root:Number of X_train:  561
INFO:root:Number of Y_train:  561
INFO:root:--------------------------------------------------------------------------------
INFO:root:
Applying 5-fold cross validation to the best model on the whole dataset...
INFO:root:--------------------------------------------------------------------------------
    train_dataset, batch_size=settings.BATCH_SIZE, shuffle=True, sampler=train_sampler)
  File "/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 321, in __init__
INFO:root:
Fold 1
Traceback (most recent call last):
  File "cnn_cv.py", line 331, in <module>
    train_dataset, batch_size=settings.BATCH_SIZE, shuffle=True, sampler=train_sampler)
  File "/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 321, in __init__
    raise ValueError('sampler option is mutually exclusive with '
ValueError: sampler option is mutually exclusive with shuffle
    raise ValueError('sampler option is mutually exclusive with '
ValueError: sampler option is mutually exclusive with shuffle
INFO:root:Number of X_train:  561
INFO:root:Number of Y_train:  561
INFO:root:--------------------------------------------------------------------------------
INFO:root:
Applying 5-fold cross validation to the best model on the whole dataset...
INFO:root:--------------------------------------------------------------------------------
INFO:root:
Fold 1
Traceback (most recent call last):
  File "cnn_cv.py", line 331, in <module>
    train_dataset, batch_size=settings.BATCH_SIZE, shuffle=True, sampler=train_sampler)
  File "/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 321, in __init__
INFO:root:Number of X_train:  561
INFO:root:Number of Y_train:  561
INFO:root:--------------------------------------------------------------------------------
INFO:root:
Applying 5-fold cross validation to the best model on the whole dataset...
INFO:root:--------------------------------------------------------------------------------
INFO:root:
Fold 1
    raise ValueError('sampler option is mutually exclusive with '
ValueError: sampler option is mutually exclusive with shuffle
Traceback (most recent call last):
  File "cnn_cv.py", line 331, in <module>
    train_dataset, batch_size=settings.BATCH_SIZE, shuffle=True, sampler=train_sampler)
  File "/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 321, in __init__
    raise ValueError('sampler option is mutually exclusive with '
ValueError: sampler option is mutually exclusive with shuffle
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 222606) of binary: /home/h_ghazik/python_venv/bin/python
Traceback (most recent call last):
  File "/usr/local/pkg/python-3.7.3/root/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/pkg/python-3.7.3/root/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/distributed/launch.py", line 195, in <module>
    main()
  File "/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/distributed/launch.py", line 191, in main
    launch(args)
  File "/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/distributed/launch.py", line 176, in launch
    run(args)
  File "/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/distributed/run.py", line 756, in run
    )(*cmd_args)
  File "/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 248, in launch_agent
    failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
cnn_cv.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2023-03-29_16:33:09
  host      : virya2.encs.concordia.ca
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 222607)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
  time      : 2023-03-29_16:33:09
  host      : virya2.encs.concordia.ca
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 222608)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
  time      : 2023-03-29_16:33:09
  host      : virya2.encs.concordia.ca
  rank      : 3 (local_rank: 3)
  exitcode  : 1 (pid: 222609)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-03-29_16:33:09
  host      : virya2.encs.concordia.ca
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 222606)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================