forked from cvg/glue-factory
-
Notifications
You must be signed in to change notification settings - Fork 0
/
treeDepth-19-07-distributedTraining-128-grayscale.log
279 lines (254 loc) · 17.2 KB
/
treeDepth-19-07-distributedTraining-128-grayscale.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
Starting the distributed training at Sat 20 Jul 09:09:30 BST 2024
Command running in screen session distributed_train_0_20240720-090930 on gpu36
Command running in screen session distributed_train_1_20240720-090930 on gpu31
Command running in screen session distributed_train_2_20240720-090930 on gpu03
Command running in screen session distributed_train_3_20240720-090930 on gpu32
Command running in screen session distributed_train_4_20240720-090930 on gpu28
Command running in screen session distributed_train_5_20240720-090930 on gpu20
Command running in screen session distributed_train_6_20240720-090930 on gpu22
Command running in screen session distributed_train_7_20240720-090930 on gpu19
Log file path is /homes/tp4618/Documents/bitbucket/SuperGlueThesis/external/glue-factory/treeDepth-19-07-distributedTraining-128-grayscale.log
Distributed training completed at Sat 20 Jul 09:09:44 BST 2024
[07/20/2024 09:09:50 gluefactory INFO] Starting experiment sp+lg_20-07-treedepth-grayscale-64
[07/20/2024 09:10:01 gluefactory INFO] Starting experiment sp+lg_20-07-treedepth-grayscale-64
0/2024 09:10:00 gluefactory INFO] Training in distributed mode with 8 GPUs
[07/20/2024 09:10:06 gluefactory INFO] Will fine-tune from weights of pretrain_lightglue
[07/20/2024 09:10:07 gluefactory INFO] Training in distributed mode with 8 GPUs
[07/20/2024 09:10:13 gluefactory INFO] Will fine-tune from weights of pretrain_lightglue
[07/20/2024 09:10:13 gluefactory INFO] Training in distributed mode with 8 GPUs
[07/20/2024 09:10:15 gluefactory INFO] Using device cuda:0
[07/20/2024 09:10:15 gluefactory.datasets.base_dataset INFO] Creating dataset TreeDepth
[07/20/2024 09:10:15 gluefactory.datasets.treedepth INFO] Initialized TreeDepth dataset with configuration: {'name': 'treedepth', 'num_workers': 1, 'train_batch_size': '???', 'val_batch_size': '???', 'test_batch_size': '???', 'shuffle_training': False, 'batch_size': 8, 'num_threads': 1, 'seed': 0, 'prefetch_factor': 2, 'data_dir': 'syntheticForestData/', 'depth_subpath': 'depthData/', 'image_subpath': 'imageData/', 'info_dir': 'fileLists', 'train_split': 'train_scenes_clean.txt', 'train_num_per_scene': 300, 'val_split': 'valid_scenes_clean.txt', 'val_num_per_scene': None, 'val_pairs': 'valid_pairs.txt', 'test_split': 'test_scenes_clean.txt', 'test_num_per_scene': None, 'test_pairs': None, 'views': 2, 'min_overlap': 0.1, 'max_overlap': 0.7, 'num_overlap_bins': 3, 'sort_by_overlap': False, 'triplet_enforce_overlap': False, 'read_depth': True, 'read_image': True, 'grayscale': False, 'preprocessing': {'resize': None, 'edge_divisible_by': None, 'side': 'long', 'interpolation': 'bilinear', 'align_corners': None, 'antialias': True, 'square_pad': False, 'add_padding_mask': False}, 'p_rotate': 0.0, 'reseed': False, 'load_features': {'do': False, 'path': 'exports/megadepth-undist-depth-r1024_SP-k2048-nms3/{scene}.h5', 'data_keys': None, 'device': None, 'trainable': False, 'add_data_path': True, 'collate': False, 'scale': ['keypoints', 'lines', 'orig_lines'], 'padding_fn': 'pad_local_features', 'padding_length': 2048, 'numeric_type': 'float32'}}
[07/20/2024 09:10:16 gluefactory.datasets.treedepth INFO] Sampling new items for train with seed 0.
[07/20/2024 09:10:17 gluefactory.datasets.treedepth INFO] Sampling new ite[07/20/2024 09:10:24 gluefactory INFO] Parameters with scaled learning rate:
{}
[07/20/2024 09:10:24 gluefactory INFO] Training with mixed_precision=None
[07/20/2024 09:10:24 gluefactory INFO] Starting training with configuration:
data:
name: treedepth
train_split: train_scenes_clean.txt
train_num_per_scene: 300
val_split: valid_scenes_clean.txt
val_pairs: valid_pairs.txt
min_overlap: 0.1
max_overlap: 0.7
num_overlap_bins: 3
read_depth: true
read_image: true
batch_size: 64
num_workers: 4
load_features:
do: false
path: exports/megadepth-undist-depth-r1024_SP-k2048-nms3/{scene}.h5
padding_length: 2048
padding_fn: pad_local_features
model:
name: two_view_pipeline
extractor:
name: extractors.superpoint_open
max_num_keypoints: 2048
force_num_keypoints: true
detection_threshold: -1
nms_radius: 3
trainable: false
ground_truth:
name: matchers.depth_matcher
th_positive: 3
th_negative: 5
th_epi: 5
matcher:
name: matchers.lightglue
filter_threshold: 0.1
flash: false
checkpointed: true
allow_no_extract: true
train:
seed: 0
epochs: 40
optimizer: adam
opt_regexp: null
optimizer_options: {}
lr: 0.0001
lr_schedule:
type: exp
start: 30
exp_div_10: 10
on_epoch: true
factor: 1.0
options: {}
lr_scaling:
- - 100
- - dampingnet.const
eval_every_iter: 500
save_every_iter: 5000
log_every_iter: 100
log_grad_every_iter: null
test_every_epoch: 1
keep_last_checkpoints: 10
load_experiment: sp+lg_densehomography
median_metrics: []
recall_metrics: {}
pr_metrics: {}
best_key: loss/total
dataset_callback_fn: sample_new_items
dataset_callback_on_val: false
clip_grad: 1.0
pr_curves: {}
plot:
- 5
- gluefactory.visualization.visualize_batch.make_match_figures
submodules: []
benchmarks: null
image_mode: null
[07/20/2024 09:10:24 gluefactory INFO] Starting epoch 0
loading pretrain lightglue, replacing model
/vol/bitbucket/tp4618/SuperGlueThesis/external/glue-factory/outputs/training/sp+lg_densehomography/checkpoint_39_61799.tar
loading pretrain lightglue, replacing model
Configuration fields in conf.model:
renaming old state dictionary keys
lines 410 of train.py with renaming keys may not work?
{'name': 'two_view_pipeline', 'extractor': {'name': 'extractors.superpoint_open', 'max_num_keypoints': 2048, 'force_num_keypoints': True, 'detection_threshold': -1, 'nms_radius': 3, 'trainable': False}, 'ground_truth': {'name': 'matchers.depth_matcher', 'th_positive': 3, 'th_negative': 5, 'th_epi': 5}, 'matcher': {'name': 'matchers.lightglue', 'filter_threshold': 0.1, 'flash': False, 'checkpointed': True}, 'allow_no_extract': True}
the rank is 3 and world size is 8
cuda:0
23 initpy data sets path gluefactory.datasets.treedepth
205 tools.py utils the classes are [('TreeDepth', <class 'gluefactory.datasets.treedepth.TreeDepth'>)]
136 in treedepth
conf keys: dict_keys(['name', 'num_workers', 'train_batch_size', 'val_batch_size', 'test_batch_size', 'shuffle_training', 'batch_size', 'num_threads', 'seed', 'prefetch_factor', 'data_dir', 'depth_subpath', 'image_subpath', 'info_dir', 'train_split', 'train_num_per_scene', 'val_split', 'val_num_per_scene', 'val_pairs', 'test_split', 'test_num_per_scene', 'test_pairs', 'views', 'min_overlap', 'max_overlap', 'num_overlap_bins', 'sort_by_overlap', 'triplet_enforce_overlap', 'read_depth', 'read_image', 'grayscale', 'preprocessing', 'p_rotate', 'reseed', 'load_features'])
Loaded training dataset: treedepth
Using the same dataset for validation as for training.
calling get_data_loader on dataset object - dataset.get_data_loader('train', distributed=args.distributed, shuffle=False)
scene_lists_path: /homes/tp4618/Documents/bitbucket/SuperGlueThesis/external/glue-factory/gluefactory/datasets/tartanSceneLists(Full)
root and info /vol/bitbucket/tp4618/SuperGlueThesis/external/glue-factory/data/syntheticForestData fileLists
!!!!!sample_new_items: num_per_scene: 300
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overusing the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
Not using the overlap matrices
scene_lists_path: /homes/tp4618/Documents/bitbucket/SuperGlueThesis/external/glue-factory/gluefactory/datasets/tartanSceneLists(Full)
root and info /vol/bitbucket/tp4618/SuperGlueThesis/external/glue-factory/data/syntheticForestData fileLists
!!!!!sample_new_items: num_per_scene: None
205 tools.py utils the classes are [('TwoViewPipeline', <class 'gluefactory.models.two_view_pipeline.TwoViewPipeline'>)]
ModuleNotFoundError: extractors.superpoint_open
205 tools.py utils the classes are [('SuperPoint', <class 'gluefactory.models.extractors.superpoint_open.SuperPoint'>)]
None
Superpoint grayscale mode selected
ModuleNotFoundError: matchers.lightglue
205 tools.py utils the classes are []
if Path(conf.weights).exists() 368 lightglue loading weights: /homes/tp4618/Documents/bitbucket/SuperGlueThesis/external/glue-factory/outputs/training/pretrain_lightglue/superpoint_lightglue.pth
state dict lightlue 385 9 to replace keys in range
remvoing log assingments
ModuleNotFoundError: matchers.depth_matcher
205 tools.py utils the classes are [('DepthMatcher', <class 'gluefactory.models.matchers.depth_matcher.DepthMatcher'>)]
initcp not none
args distibuted
device ids are [device(type='cuda', index=0)]
/homes/tp4618/Documents/bitbucket/miniconda3/envs/GlueFactory/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(
[07/20/2024 09:10:48 gluefactory INFO] [E 0 | it 0] loss {total 8.629E+00, last 9.344E+00, assignment_nll 9.344E+00, nll_pos 1.730E+01, nll_neg 1.387E+00, num_matchable 8.428E+01, num_unmatchable 8.849E+02, confidence 4.062E-01, row_norm 4.628E-01}
debugging in data_to_log, iteraiton: 0
Data to be logged to training DataFrame: {'epoch': 0, 'iteration': 0, 'total': 8.628652572631836, 'last': 9.344147682189941, 'assignment_nll': 9.344147682189941, 'nll_pos': 17.300830841064453, 'nll_neg': 1.387465476989746, 'num_matchable': 84.28125, 'num_unmatchable': 884.921875, 'confidence': 0.40620216727256775, 'row_norm': 0.4627940356731415}