Skip to content

Commit

Permalink
Merge branch 'develop' of https://github.com/NRCan/geo-deep-learning
Browse files Browse the repository at this point in the history
  • Loading branch information
mpelchat04 committed Jun 8, 2022
2 parents 87d01b0 + 406771c commit b7dbc93
Show file tree
Hide file tree
Showing 99 changed files with 4,011 additions and 3,112 deletions.
6 changes: 2 additions & 4 deletions .github/workflows/github-actions-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,5 @@ jobs:
run: |
source /usr/share/miniconda/etc/profile.d/conda.sh
conda activate geo_deep_env
python GDL.py mode=sampling
python GDL.py mode=train
python GDL.py mode=inference
python GDL.py mode=evaluate
coverage run -m pytest --log-cli-level=INFO --capture=tee-sys
coverage report -m --sort=Cover
3 changes: 2 additions & 1 deletion GDL.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,8 @@ def run_gdl(cfg: DictConfig) -> None:
# check if the mode is chosen
if type(cfg.mode) is DictConfig:
msg = "You need to choose between those modes: {}"
raise logging.critical(msg.format(list(cfg.mode.keys())))
logging.critical(msg.format(list(cfg.mode.keys())))
raise ValueError()

# save all overwritten parameters
logging.info('\nOverwritten parameters in the config: \n' + cfg.general.config_override_dirname)
Expand Down
54 changes: 13 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,16 @@

## **Overview**

The **geo-deep-learning** project stems from an initiative at NRCan's [CCMEO](https://www.nrcan.gc.ca/earth-sciences/geomatics/10776). Its aim is to allow using Convolutional Neural Networks (CNN) with georeferenced data sets.
The overall learning process comprises three broad stages.
The **geo-deep-learning** project stems from an initiative at NRCan's [CCMEO](https://www.nrcan.gc.ca/earth-sciences/geomatics/10776). Its aim is to allow using Convolutional Neural Networks (CNN) with georeferenced datasets.

### Data preparation
The data preparation phase (sampling) allows creating sub-images that will be used for either training, validation or testing.
The first phase of the process is to determine sub-images (samples) to be used for training, validation and, optionally, test.
Images to be used must be of the geotiff type.
Sample locations in each image must be stored in a GeoPackage.
In geo-deep-learning, the learning process comprises two broad stages: sampling and training, followed by inference, which makes use of a trained model to make new predictions on unseen imagery.

[comment]: <> (> Note: A data analysis module can be found [here]&#40;./utils/data_analysis.py&#41; and the documentation in [`docs/README.md`]&#40;./docs/README.md&#41;. Useful for balancing training data.)
### Data sampling (or [tiling](https://torchgeo.readthedocs.io/en/latest/user/glossary.html#term-tiling))
The data preparation phase creates [chips](https://torchgeo.readthedocs.io/en/latest/user/glossary.html#term-chip) (or patches) that will be used for either training, validation or testing.
The sampling step requires a csv as input with a list of rasters and labels to be used in the subsequent training phase. See [dataset documentation](dataset#input-data).

### Training, along with validation and testing
The training phase is where the neural network learn to use the data prepared in the previous phase to make all the predictions.
The training phase is where the neural network learns to use the data prepared in the previous phase to make all the predictions.
The crux of the learning process is the training phase.

- Samples labeled "*trn*" as per above are used to train the neural network.
Expand All @@ -35,21 +32,17 @@ This project comprises a set of commands to be run at a shell command prompt. E
- [miniconda](https://docs.conda.io/en/latest/miniconda.html) (highly recommended)
- nvidia GPU (highly recommended)

> The system can be used on your workstation or cluster and on [AWS](https://aws.amazon.com/).
> The system can be used on your workstation or cluster.
## **Installation**
Those steps are for your workstation on Ubuntu 18.04 using miniconda.
Set and activate your python environment with the following commands:
To execute scripts in this project, first create and activate your python environment with the following commands:
```shell
conda env create -f environment.yml
conda activate geo_deep_env
```
> For Windows OS:
> - Install rasterio, fiona and gdal first, before installing the rest. We've experienced some [installation issues](https://github.com/conda-forge/gdal-feedstock/issues/213), with those libraries.
> - Mlflow should be installed using pip rather than conda, as mentioned [here](https://github.com/mlflow/mlflow/issues/1951)
> Tested on Ubuntu 20.04 and Windows 10 using miniconda.
## **Running GDL**
This is an example of how to run GDL with hydra in simple steps with the _**massachusetts buildings**_ dataset in the `/data` folder, for segmentation on buildings:
This is an example of how to run GDL with hydra in simple steps with the _**massachusetts buildings**_ dataset in the `tests/data/` folder, for segmentation on buildings:

1. Clone this github repo.
```shell
Expand All @@ -67,15 +60,14 @@ python GDL.py mode=train
python GDL.py mode=inference
```

> This example is running with the default configuration `./config/gdl_config_template.yaml`, for further examples on running options see the [documentation](config/#Examples).
> You will also fund information on how to change the model or add a new one to GDL.
> This example runs with a default configuration `./config/gdl_config_template.yaml`. For further examples on configuration options see the [configuration documentation](config/#Examples).
> If you want to introduce a new task like object detection, you only need to add the code in the main folder and name it `object_detection_sampling.py` for example.
> The principle is to name the code like `task_mode.py` and the `GDL.py` will deal with the rest.
> The principle is to name the code like `{task}_{mode}.py` and the `GDL.py` will deal with the rest.
> To run it, you will need to add a new parameter in the command line `python GDL.py mode=sampling task=object_detection` or change the parameter inside the `./config/gdl_config_template.yaml`.
## **Folder Structure**
We suggest a high level structure to organize the images and the code.
We suggest the following high level structure to organize the images and the code.
```
├── {dataset_name}
└── data
Expand Down Expand Up @@ -124,30 +116,10 @@ _**Don't forget to change the path of the dataset in the config yaml.**_

[comment]: <> ( model_name: deeplabv3_resnet101 # <-- must be deeplabv3_resnet101)

[comment]: <> ( bucket_name:)

[comment]: <> ( task: segmentation # <-- must be a segmentation task)

[comment]: <> ( num_gpus: 2)

[comment]: <> ( BGR_to_RGB: False # <-- must be already in RGB)

[comment]: <> ( scale_data: [0,1])

[comment]: <> ( aux_vector_file:)

[comment]: <> ( aux_vector_attrib:)

[comment]: <> ( aux_vector_ids:)

[comment]: <> ( aux_vector_dist_maps:)

[comment]: <> ( aux_vector_dist_log:)

[comment]: <> ( aux_vector_scale:)

[comment]: <> ( debug_mode: True)

[comment]: <> ( # Module to include the NIR)

[comment]: <> ( modalities: RGBN # <-- must be add)
Expand Down
18 changes: 6 additions & 12 deletions config/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,6 @@ general:
...
inference:
...
AWS:
...
print_config: ...
mode: ...
debug: ...
Expand Down Expand Up @@ -76,32 +74,28 @@ The chosen `yaml` from the task categories will gather all the parameters releva
#### General Section
```YAML
general:
work_dir: ${hydra:runtime.cwd}
work_dir: ${hydra:runtime.cwd} # where the code is executed
config_name: ${hydra:job.config_name}
config_override_dirname: ${hydra:job.override_dirname}
config_path: ${hydra:runtime.config_sources}
project_name: template_project
workspace: your_name
device: cuda
max_epochs: 2 # for train only
min_epochs: 1 # for train only
raw_data_dir: ${general.work_dir}/data
raw_data_csv: ${general.work_dir}/data/images_to_samples_ci_csv.csv
sample_data_dir: ${general.work_dir}/data
raw_data_dir: data
raw_data_csv: tests/sampling/sampling_segmentation_binary_ci.csv
sample_data_dir: data # where the hdf5 will be saved
state_dict_path:
save_weights_dir: ${general.work_dir}/weights_saved
save_weights_dir: saved_model/${general.project_name}
```
This section contains general information that will be read by the code. Other `yaml` files read information from here.

#### AWS Section
Will follow soon.

#### Print Config Section
If `True`, will save the config in the log folder.

#### Mode Section
```YAML
mode: {sampling, train, evaluate, inference, hyperparameters_search}
mode: {sampling, train, inference, evaluate, hyperparameters_search}
```
**GDL** has five modes: sampling, train, evaluate, inference and hyperparameters search.
- *sampling*, generates `hdf5` files from a folder containing folders for each individual image with their ground truth.
Expand Down
34 changes: 17 additions & 17 deletions config/augmentation/basic_augmentation_segmentation.yaml
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
# @package _global_
augmentation:
# Normalization: parameters for finetuning. For example:
# -> mean: [0.485, 0.456, 0.406]
# -> std: std: [0.229, 0.224, 0.225])
normalization:
mean:
std:
scale_data: [0, 1]
# Rotate limit: the upper and lower limits for data rotation.
rotate_limit: 45
# Rotate probability: the probability for data rotation.
rotate_prob: 0.5
# Horizontal flip: the probability for data horizontal flip.
hflip_prob: 0.5
# Geometric augmentations
rotate_limit: 45 # Rotate limit: the upper and lower limits for data rotation.
rotate_prob: 0.5 # Rotate probability: the probability for data rotation.
hflip_prob: 0.5 # Horizontal flip: the probability for data horizontal flip.
crop_size: # size to crop data (image and labels samples) during training

# Radiometric augmentations
noise: # Standard deviation of Gaussian Noise
# Range of the random percentile:
# the range in which a random percentile value will
# be chosen to trim values. This value applies to
# both left and right sides of the raster's histogram.
# the range in which a random percentile value will be chosen to trim values.
# This value applies to both left and right sides of the raster's histogram.
random_radiom_trim_range: [ 0.1, 2.0 ]
brightness_contrast_range: # Not yet implemented
noise: # Standard deviation of Gaussian Noise (optional)

# Augmentations done immediately before conversion to torch tensor
normalization: # Normalization: parameters for finetuning. See examples below:
mean: # -> mean: [0.485, 0.456, 0.406]
std: # -> std: std: [0.229, 0.224, 0.225])
scale_data: [ 0, 1 ] # Min and max value to scale values of input imagery

Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,22 @@
dataset:
# dataset-wide
name:
input_dim: 256
input_dim: 32
overlap:
use_stratification: False
train_val_percent: {'trn':0.7, 'val':0.3, 'tst':0}
raw_data_csv: ${general.raw_data_csv}
raw_data_dir: ${general.raw_data_dir}

# imagery
modalities: RGB
bands: [R, G, B]

# ground truth
attribute_field: properties/class
attribute_values: [1]
min_annot_perc:
min_annotated_percent:
class_name: # will follow in the next version
classes_dict: {'Building':1}
classes_dict: {'BUIL':1}
class_weights:
ignore_index: -1

Expand Down
26 changes: 26 additions & 0 deletions config/dataset/test_ci_segmentation_multiclass.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# @package _global_
dataset:
# dataset-wide
name:
input_dim: 32
overlap:
use_stratification: False
train_val_percent: {'trn':0.7, 'val':0.3, 'tst':0}
raw_data_csv: tests/sampling/sampling_segmentation_multiclass_ci.csv
raw_data_dir: ${general.raw_data_dir}

# imagery
bands: [R, G, B]

# ground truth
attribute_field: properties/Quatreclasses
attribute_values: [1,2,3,4]
min_annotated_percent:
class_name: # will follow in the next version
classes_dict: {'WAER':1, 'FORE':2, 'ROAI':3, 'BUIL':4}
class_weights:
ignore_index: 255

# outputs
sample_data_dir: ${general.sample_data_dir}

21 changes: 8 additions & 13 deletions config/gdl_config_template.yaml
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
defaults:
- model: unet
- model: gdl_unet
- training: default_training
- loss: binary/softbce
- optimizer: adamw
- callbacks: default_callbacks
- scheduler: plateau
- dataset: test_ci_segmentation_dataset
- dataset: test_ci_segmentation_binary
- augmentation: basic_augmentation_segmentation
- tracker: # set logger here or use command line (e.g. `python GDL.py tracker=mlflow`)
- visualization: default_visualization
- inference: default_inference
- inference: default_binary
- hydra: default
- override hydra/hydra_logging: colorlog # enable color logging to make it pretty
- override hydra/job_logging: colorlog # enable color logging to make it pretty
Expand All @@ -27,18 +27,13 @@ general:
config_path: ${hydra:runtime.config_sources}
project_name: template_project
workspace: your_name
device: cuda
max_epochs: 2 # for train only
min_epochs: 1 # for train only
raw_data_dir: ${general.work_dir}/data
raw_data_csv: ${general.work_dir}/data/images_to_samples_ci_csv.csv
sample_data_dir: ${general.work_dir}/data # where the hdf5 will be saved
state_dict_path:
save_weights_dir: ${general.work_dir}/weights_saved

AWS:
bucket_name:
raw_data_dir: dataset
raw_data_csv: tests/sampling/sampling_segmentation_binary_ci.csv
sample_data_dir: dataset # where the hdf5 will be saved
save_weights_dir: saved_model/${general.project_name}

print_config: True # save the config in the log folder
mode: {sampling, train, evaluate, inference, hyperparameters_search}
mode: {sampling, train, inference, evaluate}
debug: True #False # will print the complete yaml config plus run a validation test
1 change: 1 addition & 0 deletions config/hydra/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ run:
sweep:
dir: logs/multiruns/${now:%Y-%m-%d_%H-%M-%S}
subdir: ${hydra.job.num}
verbose: ${debug}

# you can set here environment variables that are universal for all users
# for system specific variables (like data paths) it's better to use .env file!
Expand Down
16 changes: 16 additions & 0 deletions config/inference/default_binary.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# @package _global_
inference:
img_dir_or_csv_file: tests/inference/inference_segmentation_ci.csv
state_dict_path: ${general.save_weights_dir}/checkpoint.pth.tar
chunk_size: # if empty, will be calculated automatically from max_pix_per_mb_gpu
# Maximum number of pixels each Mb of GPU Ram to allow. E.g. if GPU has 1000 Mb of Ram and this parameter is set to
# 10, chunk_size will be set to sqrt(1000 * 10) = 100.
max_pix_per_mb_gpu: 25

# GPU parameters
gpu: ${training.num_gpus}
max_used_perc: ${training.max_used_perc} # If RAM usage of detected GPU exceeds this percentage, it will be ignored
max_used_ram: ${training.max_used_ram} # If GPU's usage exceeds this percentage, it will be ignored

# Post-processing
ras2vec: False # if True, a polygonized version of the inference (.gpkg) will be created with rasterio tools
8 changes: 0 additions & 8 deletions config/inference/default_inference.yaml

This file was deleted.

16 changes: 16 additions & 0 deletions config/inference/default_multiclass.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# @package _global_
inference:
img_dir_or_csv_file: tests/sampling/sampling_segmentation_multiclass_ci.csv
state_dict_path: ${general.save_weights_dir}/checkpoint.pth.tar
chunk_size: # if empty, will be calculated automatically from max_pix_per_mb_gpu
# Maximum number of pixels each Mb of GPU Ram to allow. E.g. if GPU has 1000 Mb of Ram and this parameter is set to
# 10, chunk_size will be set to sqrt(1000 * 10) = 100.
max_pix_per_mb_gpu: 25

# GPU parameters
gpu: ${training.num_gpus}
max_used_perc: ${training.max_used_perc} # If RAM usage of detected GPU exceeds this percentage, it will be ignored
max_used_ram: ${training.max_used_ram} # If GPU's usage exceeds this percentage, it will be ignored

# Post-processing
ras2vec: False # if True, a polygonized version of the inference (.gpkg) will be created with rasterio tools
5 changes: 0 additions & 5 deletions config/model/checkpoint_unet.yaml

This file was deleted.

5 changes: 0 additions & 5 deletions config/model/deeplabv3+_pretrained.yaml

This file was deleted.

5 changes: 0 additions & 5 deletions config/model/deeplabv3_resnet101.yaml

This file was deleted.

15 changes: 0 additions & 15 deletions config/model/fastrcnn.yaml

This file was deleted.

Loading

0 comments on commit b7dbc93

Please sign in to comment.