Stratified sampling #83

alishibli97 · 2024-10-02T18:49:29Z

No description provided.

KerekesDavid

Good job, seems to be working with MADOS on my end.

One nitpick, since the stratification itself takes quite long, and one of the usecases for limiting dataset size is to have a quick test run on a few samples, could you add a toggle to have the old behavior of just random sampling?

KerekesDavid · 2024-10-04T13:15:04Z

pangaea/run.py

-                range(n_train_samples), int(n_train_samples * cfg.limited_label)
-            )
-            train_dataset = Subset(train_dataset, indices)
+            # n_train_samples = len(train_dataset)


You can remove these, that's why we have git :)

VMarsocci · 2024-10-04T15:42:26Z

Hi before merging, please consider the following changes (already discussed in private- but I report them here for everyone else).

add the possibility to choose the stratified sampling on a single set (either train or val)
check the last version of HLSBurnScars (in the one you want to merge it is not updated)
replace the "calculate class distribution" function with just reading this distribution in the dataset config
Thanks a lot :)

yurujaja · 2024-10-07T15:13:17Z

To address the comments, some modifications are made:

configure sampling for both train and val
configure the limited label strategy: random or stratified
the unused function in hlsburn scar is removed
the "calculate class distribution" function is still needed to calculate image-wise distribution to enable the selection of images

alishibli97 · 2024-10-08T12:18:52Z

Added Regression stratification.
In train.yaml, one can specify the stratification method using the param:
limited_label_strategy: stratified_classification # or stratified_regression, or random

@RituYadav92 is it possible to validate that it works on your side with biomastters?

…tion of labels from each bin Previous code: A fraction of labels were selected from the sorted values. Specifically, for biomass, it was selecting samples with the lowest biomass.

RituYadav92 · 2024-10-09T10:46:05Z

I made two modifications to the code:

Updated the code to select a fraction of labels from each bin.
Updated variable names "labeled_idx, unlabeled_idx" to "selected_idx, other_idx"

Please update the same for classification if it fits well.

pangaea/utils/subset_sampler.py

RituYadav92

The code looks fine now except line 68 in subset_sampler.py. It should be "if bin_id in indices_per_bin:" instead of "if bin_id not in indices_per_bin:"

Please check.

alishibli97 · 2024-10-10T13:42:01Z

I think it is fine, what do you suspect?
It is only to initialize the dict key if not present

RituYadav92 · 2024-10-10T14:09:36Z

I see now, you didn't adapt the initialization from regression but did it other way. Np problem. Resolved.

alishibli97 and others added 7 commits September 24, 2024 15:02

add stratified sampling to training set

73ca7ab

add function in hls class

00ce22e

adding startification

cd3ce0c

add geofmsubset class

90e8c3c

add val stratification and logging info

b3a5a1a

Merge remote-tracking branch 'origin/main' into stratified_sampling

ed59f03

update readme

f709e35

alishibli97 assigned KerekesDavid, gle-bellier, nascetti-a, yurujaja and VMarsocci Oct 2, 2024

KerekesDavid approved these changes Oct 4, 2024

View reviewed changes

KerekesDavid linked an issue Oct 4, 2024 that may be closed by this pull request

Limited label problem and startified sampling #82

Closed

VMarsocci requested review from KerekesDavid, gle-bellier, VMarsocci and yurujaja October 4, 2024 15:55

VMarsocci unassigned nascetti-a Oct 4, 2024

yurujaja added 3 commits October 7, 2024 12:11

Merge remote-tracking branch 'origin/main' into stratified_sampling

8fbc77e

re-add hlsburn train-val split

f396083

limited label for both train and val, random or stratified sampling

8080464

add regression stratification

f25a445

yurujaja requested a review from RituYadav92 October 8, 2024 15:08

Updated "stratify_regression_dataset_indices" function to return frac…

33053c2

…tion of labels from each bin Previous code: A fraction of labels were selected from the sorted values. Specifically, for biomass, it was selecting samples with the lowest biomass.

adding segmentation stratification

927eec6

yurujaja and others added 3 commits October 10, 2024 13:23

deep copy ckpt

04f783b

Merge remote-tracking branch 'origin' into stratified_sampling

8a4e365

synched the steps between classification and regression

b3b1378

RituYadav92 reviewed Oct 10, 2024

View reviewed changes

pangaea/utils/subset_sampler.py Show resolved Hide resolved

RituYadav92 reviewed Oct 10, 2024

View reviewed changes

yurujaja added 2 commits October 10, 2024 16:03

enable stratified sampling and oversampling

bdc7dd6

fix conflict

32429d7

yurujaja and others added 2 commits October 10, 2024 16:39

add docstring

28238db

Update README.md

afa76a0

VMarsocci approved these changes Oct 10, 2024

View reviewed changes

RituYadav92 and others added 2 commits October 10, 2024 17:02

Added comment to guide oversampling for biomass or regression in general

9159a83

Update a comment

424535c

yurujaja merged commit 5794b4c into main Oct 10, 2024
1 check passed

yurujaja deleted the stratified_sampling branch October 10, 2024 15:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stratified sampling #83

Stratified sampling #83

alishibli97 commented Oct 2, 2024

KerekesDavid left a comment

KerekesDavid Oct 4, 2024

VMarsocci commented Oct 4, 2024

yurujaja commented Oct 7, 2024

alishibli97 commented Oct 8, 2024

RituYadav92 commented Oct 9, 2024

RituYadav92 left a comment

alishibli97 commented Oct 10, 2024

RituYadav92 commented Oct 10, 2024

Stratified sampling #83

Stratified sampling #83

Conversation

alishibli97 commented Oct 2, 2024

KerekesDavid left a comment

Choose a reason for hiding this comment

KerekesDavid Oct 4, 2024

Choose a reason for hiding this comment

VMarsocci commented Oct 4, 2024

yurujaja commented Oct 7, 2024

alishibli97 commented Oct 8, 2024

RituYadav92 commented Oct 9, 2024

RituYadav92 left a comment

Choose a reason for hiding this comment

alishibli97 commented Oct 10, 2024

RituYadav92 commented Oct 10, 2024