-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcnncv.o75104
167 lines (166 loc) · 8.8 KB
/
cnncv.o75104
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/distributed/launch.py:188: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
FutureWarning,
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1
INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
INFO:root:Number of representations: 414
INFO:root:Number of representations: 414
INFO:root:--------------------------------------------------
INFO:root:--------------------------------------------------
INFO:root:--------------------------------------------------
INFO:root:--------------------------------------------------
INFO:root:Dataset name: iontransporters_membraneproteins
INFO:root:Dataset name: iontransporters_membraneproteins
INFO:root:Dataset type: balanced
INFO:root:Dataset type: balanced
INFO:root:Dataset number: 10
INFO:root:Dataset number: 10
INFO:root:Representation type: finetuned
INFO:root:Representer model: ESM-1b
INFO:root:Representation type: finetuned
INFO:root:Precision type: full
INFO:root:Representer model: ESM-1b
INFO:root:Precision type: full
INFO:root:Number of representations: 414
INFO:root:--------------------------------------------------
INFO:root:--------------------------------------------------
INFO:root:Dataset name: iontransporters_membraneproteins
INFO:root:Dataset type: balanced
INFO:root:Dataset number: 10
INFO:root:Representation type: finetuned
INFO:root:Representer model: ESM-1b
INFO:root:Precision type: full
INFO:root:Number of representations: 414
INFO:root:--------------------------------------------------
INFO:root:--------------------------------------------------
INFO:root:Dataset name: iontransporters_membraneproteins
INFO:root:Dataset type: balanced
INFO:root:Dataset number: 10
INFO:root:Representation type: finetuned
INFO:root:Representer model: ESM-1b
INFO:root:Precision type: full
INFO:root:Number of X_train: 561
INFO:root:Number of Y_train: 561
INFO:root:--------------------------------------------------------------------------------
INFO:root:
Applying 5-fold cross validation to the best model on the whole dataset...
INFO:root:--------------------------------------------------------------------------------
INFO:root:
Fold 1
Traceback (most recent call last):
File "cnn_cv.py", line 331, in <module>
INFO:root:Number of X_train: 561
INFO:root:Number of Y_train: 561
INFO:root:--------------------------------------------------------------------------------
INFO:root:
Applying 5-fold cross validation to the best model on the whole dataset...
INFO:root:--------------------------------------------------------------------------------
train_dataset, batch_size=settings.BATCH_SIZE, shuffle=True, sampler=train_sampler)
File "/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 321, in __init__
INFO:root:
Fold 1
Traceback (most recent call last):
File "cnn_cv.py", line 331, in <module>
train_dataset, batch_size=settings.BATCH_SIZE, shuffle=True, sampler=train_sampler)
File "/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 321, in __init__
raise ValueError('sampler option is mutually exclusive with '
ValueError: sampler option is mutually exclusive with shuffle
raise ValueError('sampler option is mutually exclusive with '
ValueError: sampler option is mutually exclusive with shuffle
INFO:root:Number of X_train: 561
INFO:root:Number of Y_train: 561
INFO:root:--------------------------------------------------------------------------------
INFO:root:
Applying 5-fold cross validation to the best model on the whole dataset...
INFO:root:--------------------------------------------------------------------------------
INFO:root:
Fold 1
Traceback (most recent call last):
File "cnn_cv.py", line 331, in <module>
train_dataset, batch_size=settings.BATCH_SIZE, shuffle=True, sampler=train_sampler)
File "/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 321, in __init__
INFO:root:Number of X_train: 561
INFO:root:Number of Y_train: 561
INFO:root:--------------------------------------------------------------------------------
INFO:root:
Applying 5-fold cross validation to the best model on the whole dataset...
INFO:root:--------------------------------------------------------------------------------
INFO:root:
Fold 1
raise ValueError('sampler option is mutually exclusive with '
ValueError: sampler option is mutually exclusive with shuffle
Traceback (most recent call last):
File "cnn_cv.py", line 331, in <module>
train_dataset, batch_size=settings.BATCH_SIZE, shuffle=True, sampler=train_sampler)
File "/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 321, in __init__
raise ValueError('sampler option is mutually exclusive with '
ValueError: sampler option is mutually exclusive with shuffle
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 222606) of binary: /home/h_ghazik/python_venv/bin/python
Traceback (most recent call last):
File "/usr/local/pkg/python-3.7.3/root/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/pkg/python-3.7.3/root/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/distributed/launch.py", line 195, in <module>
main()
File "/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/distributed/launch.py", line 191, in main
launch(args)
File "/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/distributed/launch.py", line 176, in launch
run(args)
File "/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/distributed/run.py", line 756, in run
)(*cmd_args)
File "/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/h_ghazik/python_venv/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 248, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
cnn_cv.py FAILED
------------------------------------------------------------
Failures:
[1]:
time : 2023-03-29_16:33:09
host : virya2.encs.concordia.ca
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 222607)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2023-03-29_16:33:09
host : virya2.encs.concordia.ca
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 222608)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time : 2023-03-29_16:33:09
host : virya2.encs.concordia.ca
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 222609)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-03-29_16:33:09
host : virya2.encs.concordia.ca
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 222606)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================