Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
yurujaja committed Sep 16, 2024
1 parent 56a107c commit 9fdd93c
Show file tree
Hide file tree
Showing 2 changed files with 124 additions and 23 deletions.
58 changes: 58 additions & 0 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
## Contributing

We welcome all forms of contributions, including but not limited to the following.

- Introduce new geospatial foundation models
- Incorporate downstream datasets
- Add new decoder heads
- Fix typo or bugs

### Workflow

1. fork and pull the latest repository
2. checkout a new branch (do not use the main branch for PRs)
3. commit your changes
4. create a PR

Note: For significant modifications or any bugs spotting, please consider opening an issue for discussion beforehand.


### Adding a new geospatial foundation model
1. Inside the `foundation_models` folder:
- Add your model architecture. Including the decoder is optional, as the project focuses on evaluating pretrained model encoders for downstream tasks.
- Update `__init__.py` to include the model.

2. Inside the `configs/foundation_models` folder:
- Create a configuration file for the model:
- Provide a `download_url` if available
- Detail the model, including its support for temporality, the image size used, and the encoder's output dimension
- Specify the parameters for initializing your model in `encoder_model_args`

### Adding a new downstream dataset
1. Inside the `datasets` folder:
- Add your dataset file.
- In `__getitem__` function, , structure the output based on the modalities available in your dataset as follows:
```
{
'image': {
'optical': optical_tensor,
'sar' : sar_tensor,
},
'target': target_tensor,
'metadata': {
"info1": info1,
}
}
```
- For uni-temporal dataset, shape the image tensors (C, H, W)
- For uni-temporal dataset, shape the image tensors (C, T, H, W)
- Implement a `get_splits` function to manage dataset train/val/test splits. Use other datasets as references.
- Update `__init__.py` to include the dataset.
2. In the `configs/datasets` folder:
- Add a configuration for the dataset:
- Provide a `download_url` if possible
- For uni-temporal dataset, set `multi_temporal` to `False`; for multi-temporal dataset, indicate the number of time frames used, e.g., `multi_temporal: 6`
- Include information about the dataset bands, including types and statistics
- Provide information about the dataset classes, such as number of classes, names, ignore index, and distribution
89 changes: 66 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
[![Tests](https://github.com/yurujaja/geofm-bench/actions/workflows/python-test.yml/badge.svg)](https://github.com/yurujaja/geofm-bench/actions/workflows/python-test.yml)

## What is New
In general, the architecture of the whole codebase is refactored and a few bugs and errors are fixed by the way.

## Introduction
(TBD)

### engines
In engines, basic modules in the training pipeline are defined including data_preprocessor, trainer and evaluator.
Expand Down Expand Up @@ -30,21 +29,52 @@ In engines, basic modules in the training pipeline are defined including data_pr
4. So far, we have UPerNet for unitemporal semantic segmentation, UPerNetCD for change detection and MTUPerNet for multitemporal semantic segmentation
5. for multi-temporal, L-TAE and linear projection are supported

### Other comments
1. In the segmentor config, different losses, optimizers and schedulers can be picked (you have to define them in the respective utils file)
All of these parameters can also be set in the run config file.

To use more gpus or nodes, set `--nnodes` and `--nproc_per_node` correspondingly, see:
https://pytorch.org/docs/stable/elastic/run.html

To use mixed precision training, specify either `--fp16` for float16 and or `--bf16` for bfloat16

For fine-tuning instead of linear probing, specify `--finetune`.

## 🛠️ Setup
Clone the repository:
```
git clone [email protected]:yurujaja/geofm-bench.git
cd geofm-bench
```

## What is still missing
1. Add the other datasets and foundation models following the existing examples in this codebase. Meanwhile, check the correctness of the original datasets before copypasting. [IMPORTANT]
2. More data augmentation need to be done. It wraps the dataset class by a configurable augmentor to perform both data preprocessing and augmentation. In this way, we avoid preprocessing data in the main process, which is slow.
**Dependencies**

Use either Conda or Mamba:
```
conda env create -f environment.yaml
conda activate geofm-bench8
```

Optional: install [Mamba](https://github.com/conda-forge/miniforge/releases/) for faster resolution times
```
wget https://github.com/conda-forge/miniforge/releases/download/24.3.0-0/Mambaforge-24.3.0-0-Linux-x86_64.sh
./Mambaforge-24.3.0-0-Linux-x86_64.sh
mamba env create -f environment.yaml
mamba activate geofm-bench8
```

## Setup
Should be the same as the v1 version of the code, maybe some dependencies can be removed
## 🏋️ Training
There are 5 basic component types in our config system:
- `config`: Information of training settings such as batch size, epochs, use wandb. `limited_label` is to indicate the percentage of dataset used for training, for example, `-1` means the full training dataset is used while `0.5` means 50% used.
- `encoder_config`: GFM encoder related parameters. `output_layers` is used for which layers are used for Upernet decoder.
- `dataset_config`: Information of downstream datasets such as image size, band_statistics, etc.
- `segmentor_config`: Downstream task decoder fine-tuning related parameters, including the head type, loss, optimizer, scheduler, etc.
- `augmentation_config`: Both preprocessing and augmentations steps required for the dataset, such as bands adaptation, normalization, resize/crop.

We provide several examples of command lines to initilize different training tasks on single gpu.
### 💻 Decoder Finetuning
**Single Temporal Semantic Segmentation**

## Example
### Training: Single Temporal
Set `config`, `encoder_config`, `dataset_config`, `segmentor_config` and `augmentation_config` and start the training process on single gpu:
Take MADOS dataset, Prithvi Encoder and Upernet Decoder as example:
```
torchrun --nnodes=1 --nproc_per_node=1 run.py \
--config configs/run/default.yaml \
Expand All @@ -55,31 +85,44 @@ torchrun --nnodes=1 --nproc_per_node=1 run.py \
--num_workers 4 --eval_interval 1 --use_wandb
```

### Training: Multi Temporal
**Multi Temporal Semantic Segmentation**

Multi-temporal model `configs/segmentors/upernet_mt.yaml` should be used. In addition, in the dataset config, indicate the number of time frames, e.g., `multi_temporal: 6`
```
torchrun --nnodes=1 --nproc_per_node=1 run.py \
--config configs/run/default.yaml \
--encoder_config configs/foundation_models/prithvi.yaml \
--dataset_config configs/datasets/croptypemapping.yaml \
--segmentor_config configs/segmentors/upernet_mt.yaml \
--augmentation_config configs/augmentations/segmentation_default.yaml \
--augmentation_config configs/augmentations/ctm.yaml \
--num_workers 4 --eval_interval 1 --use_wandb
```

### Evaluation
**Multi Temporal Change Detection**
```
torchrun --nnodes=1 --nproc_per_node=1 run.py --batch_size 1 --eval_dir work-dir/the-folder-where-your-exp-is-saved
torchrun ...
```

All of these parameters can also be set in the run config file.
**Multi Temporal Regression**
```
torchrun ...
```

To use more gpus or nodes, set `--nnodes` and `--nproc_per_node` correspondingly, see:
https://pytorch.org/docs/stable/elastic/run.html
### 💻 Fully Supervised Training
**Single Temporal Change Detection**
```
torchrun ...
```
## 🏃 Evaluation
Indicate the `eval_dir` where the checkpoints and configurations are stored.
```
torchrun --nnodes=1 --nproc_per_node=1 run.py --batch_size 1 --eval_dir work-dir/the-folder-where-your-exp-is-saved
```

To use mixed precision training, specify either `--fp16` for float16 and or `--bf16` for bfloat16

For fine-tuning instead of linear probing, specify `--finetune`.
## ✏️ Contributing
We appreciate all contributions to improve xxx. Please refer to [Contributing Guidelines](.github/CONTRIBUTING.md)




Expand All @@ -91,4 +134,4 @@ For fine-tuning instead of linear probing, specify `--finetune`.
| Prithvi | HLSBurnScars | 80 | 86.208 |
| Prithvi | Sen1Floods11 | 80 | 87.217 |


## 💡 Acknowledgements

0 comments on commit 9fdd93c

Please sign in to comment.