Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new preprocessor and only fix known bugs #89

Closed
wants to merge 11 commits into from

Conversation

LeungTsang
Copy link
Collaborator

@LeungTsang LeungTsang commented Oct 11, 2024

1. The preprocessor is again redesigned.

1.1 Previous GeoFMDataset is renamed to RawFMDataset and current GeoFMDataset is a wrapper class to combine RawFMDataset and Preprocessor.

1.2 The Preprocessor is initialized by preprocessor_cfg, dataset_cfg, encoder_cfg. a list of defined preprocessor will then be initialized with data statistics/info (e.g., data mean/std) being tracked to ensure preprocessors works properly in any order.

1.3 BandAdaptor is split into BandFilter and BandPadding. So BnadFilter in the beginning and BandPadding in the end to avoid operating trivial bands.

1.4 Tile is replaced by sliding_inference in evaluator. It avoids load, decode, preprocess images multiple times when evaluating and its implementation is more straightforward. Getting intact images also facilitates potential operations during inference.

1.5 Add FocusRandomCrop for sparse annotated data. For example MADOS has its most label ignored and RandomCrop can hardly crop regions with valid labels. FocusRandomCrop guarantee crops have valid labels.

1.6 Add RandomResizedCrop.

2. known bugs

2.1 GFM/SatlasNet/SpectralGPT do not resume the 2D feature layout correctly. They are fixed now.

2.2 For models whose output is already multi-scale pyramid features (swin), the Feature2Pyramid neck in upernet is skiped.

2.3 SiamUperNet feed single frame to encoder but the encoder is configured to accept 2 frames. An enforce_single_temporal method is added to turn the encoder to single-temporal setting.

2.4 Computing overall mean MSE in Regression evaluator is still incorrect because batch size is not even during testing and the metric is not reduced across all GPUs. It is fixed now.

2.5 Skipping NaN loss is replaced by raising an error. The initial motivation was the same as 1.5 but the model was still not properly trained and it might suppress other issues.

2.6 A barrier is added to let all processes wait utill the checkpoint saving completes.

It was extremely tricky to fix the bugs in encoders and decoders without an overall refactoring as the modular design is broken in fact. Changing one small thing may break other stuffs, possibly wihtout being noticed and giving wrong experiment results. Please check if new bugs are introduced and we may indeed think about a refactoring before adding more "if ... else ...".

@LeungTsang
Copy link
Collaborator Author

And, one or more of the ssl4eo google drive download links seems no longer working somehow. Could some one look into it?

@@ -542,18 +544,19 @@ def load_encoder_weights(self, logger: Logger) -> None:

def forward(self, imgs):
# Define forward pass
print(imgs["optical"].shape)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove both the prints, especially the second one gives an error


if len(data["target"].shape) != 2:
raise AssertionError(f"Target dimension must be 2 (H, W), Got {str(len(data['target'].shape))}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires single temporal datasets to return in the format (C, T, H, W), but not all of them have been updated accordingly.

output = []
for i, blk in enumerate(self.blocks):
x = blk(x)
if i in self.output_layers:
if self.cls_embed:
x = x[:, 1:]
x = x.view(N, T, L, C).transpose(2, 3).flatten(1, 2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpectralGPT model doesn't work now, giving the error:
[rank0]: File "/localhome/yjia/code/geofm-bench/pangaea/encoders/spectralgpt_encoder.py", line 212, in forward
[rank0]: .view(
[rank0]: RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

@yurujaja yurujaja closed this Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants