Skip to content

Commit

Permalink
Merge branch 'develop' of https://github.com/NRCan/geo-deep-learning
Browse files Browse the repository at this point in the history
  • Loading branch information
mpelchat04 committed Oct 7, 2022
2 parents 092d06b + cdea293 commit da034c9
Show file tree
Hide file tree
Showing 32 changed files with 339 additions and 192 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ cd geo-deep-learning
2. Run the wanted script (for segmentation).
```shell
# Creating the hdf5 from the raw data
python GDL.py mode=sampling
python GDL.py mode=tiling
# Training the neural network
python GDL.py mode=train
# Inference on the data
Expand Down
11 changes: 5 additions & 6 deletions config/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ The **_tracker section_** is set to `None` by default, but will still log the in
If you want to set a tracker you can change the value in the config file or add the tracker parameter at execution time via the command line `python GDL.py tracker=mlflow mode=train`.

The **_inference section_** contains the information to execute the inference job (more options will follow soon).
This part doesn't need to be filled if you want to launch sampling, train or hyperparameters search mode only.
This part doesn't need to be filled if you want to launch tiling, train or hyperparameters search mode only.

The **_task section_** manages the executing task. `Segmentation` is the default task since it's the primary task of GDL.
However, the goal will be to add tasks as need be. The `GDL.py` code simply executes the main function from the `task_mode.py` in the main folder of GDL.
Expand All @@ -83,7 +83,7 @@ general:
max_epochs: 2 # for train only
min_epochs: 1 # for train only
raw_data_dir: data
raw_data_csv: tests/sampling/sampling_segmentation_binary_ci.csv
raw_data_csv: tests/tiling/tiling_segmentation_binary_ci.csv
sample_data_dir: data # where the hdf5 will be saved
state_dict_path:
save_weights_dir: saved_model/${general.project_name}
Expand All @@ -95,10 +95,10 @@ If `True`, will save the config in the log folder.

#### Mode Section
```YAML
mode: {sampling, train, inference, evaluate, hyperparameters_search}
mode: {tiling, train, inference, evaluate, hyperparameters_search}
```
**GDL** has five modes: sampling, train, evaluate, inference and hyperparameters search.
- *sampling*, generates `hdf5` files from a folder containing folders for each individual image with their ground truth.
**GDL** has five modes: tiling, train, evaluate, inference and hyperparameters search.
- *tiling*, generates .geotiff and .geojson [chips](https://torchgeo.readthedocs.io/en/latest/user/glossary.html#term-chip) from each source aoi (image & ground truth).
- *train*, will train the model specified with all the parameters in `training`, `trainer`, `optimizer`, `callbacks` and `scheduler`. The outcome will be `.pth` weights.
- *evaluate*, this function needs to be filled with images, their ground truth and a weight for the model. At the end of the evaluation you will obtain statistics on those images.
- *inference*, unlike the evaluation, the inference doesn't need a ground truth. The inference will produce a prediction on the content of the images fed to the model. Depending on the task, the outcome file will differ.
Expand Down Expand Up @@ -148,4 +148,3 @@ new:
$ python GDL.py --config-name=/path/to/new/gdl_config.yaml mode=train
```


4 changes: 2 additions & 2 deletions config/dataset/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
### Input dimensions and overlap

These parameters respectively set the width and length of a single sample and stride from one sample to another as
outputted by sampling_segmentation.py. Default to 256 and 0, respectively.
outputted by tiling_segmentation.py. Default to 256 and 0, respectively.

### Train/validation percentage

Expand All @@ -31,7 +31,7 @@ For more information on the concept of stratified sampling, see [this Medium art

### Modalities

Bands to be selected during the sampling process. Order matters (ie "BGR" is not equal to "RGB").
Bands to be selected during the tiling process. Order matters (ie "BGR" is not equal to "RGB").
The use of this feature for band selection is a work in progress. It currently serves to indicate how many bands are in
source imagery.

Expand Down
9 changes: 2 additions & 7 deletions config/dataset/test_ci_segmentation_binary.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,21 @@
dataset:
# dataset-wide
name:
input_dim: 32
overlap:
use_stratification: False
train_val_percent: {'trn':0.7, 'val':0.3, 'tst':0}
raw_data_csv: ${general.raw_data_csv}
raw_data_dir: ${general.raw_data_dir}
download_data: False

# imagery
bands: [R, G, B]
bands: [1,2,3]

# ground truth
attribute_field: properties/class
attribute_values: [1]
min_annotated_percent:
class_name: # will follow in the next version
classes_dict: {'BUIL':1}
class_weights:
ignore_index: -1

# outputs
sample_data_dir: ${general.sample_data_dir}
tiling_data_dir: ${general.tiling_data_dir}

9 changes: 2 additions & 7 deletions config/dataset/test_ci_segmentation_binary_stac.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,7 @@
dataset:
# dataset-wide
name:
input_dim: 32
overlap:
use_stratification: False
train_val_percent: {'trn':0.7, 'val':0.3, 'tst':0}
raw_data_csv: tests/sampling/sampling_segmentation_binary-stac_ci.csv
raw_data_csv: tests/tiling/tiling_segmentation_binary-stac_ci.csv
raw_data_dir: ${general.raw_data_dir}
download_data: False

Expand All @@ -16,12 +12,11 @@ dataset:
# ground truth
attribute_field:
attribute_values:
min_annotated_percent:
class_name: # will follow in the next version
classes_dict: {'BUIL':1}
class_weights:
ignore_index: -1

# outputs
sample_data_dir: ${general.sample_data_dir}
tiling_data_dir: ${general.tiling_data_dir}

11 changes: 3 additions & 8 deletions config/dataset/test_ci_segmentation_multiclass.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,21 @@
dataset:
# dataset-wide
name:
input_dim: 32
overlap:
use_stratification: False
train_val_percent: {'trn':0.7, 'val':0.3, 'tst':0}
raw_data_csv: tests/sampling/sampling_segmentation_multiclass_ci.csv
raw_data_csv: tests/tiling/tiling_segmentation_multiclass_ci.csv
raw_data_dir: ${general.raw_data_dir}
download_data: False

# imagery
bands: [R, G, B]
bands: [1,2,3]

# ground truth
attribute_field: properties/Quatreclasses
attribute_values: [1,2,3,4]
min_annotated_percent:
class_name: # will follow in the next version
classes_dict: {'WAER':1, 'FORE':2, 'ROAI':3, 'BUIL':4}
class_weights:
ignore_index: 255

# outputs
sample_data_dir: ${general.sample_data_dir}
tiling_data_dir: ${general.tiling_data_dir}

7 changes: 4 additions & 3 deletions config/gdl_config_template.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
defaults:
- model: gdl_unet
- verify: default_verify
- tiling: default_tiling
- training: default_training
- loss: binary/softbce
- optimizer: adamw
Expand Down Expand Up @@ -31,10 +32,10 @@ general:
max_epochs: 2 # for train only
min_epochs: 1 # for train only
raw_data_dir: dataset
raw_data_csv: tests/sampling/sampling_segmentation_binary_ci.csv
sample_data_dir: dataset # where the hdf5 will be saved
raw_data_csv: tests/tiling/tiling_segmentation_binary_ci.csv
tiling_data_dir: dataset # where the hdf5 will be saved
save_weights_dir: saved_model/${general.project_name}

print_config: True # save the config in the log folder
mode: {verify, sampling, train, inference, evaluate}
mode: {verify, tiling, train, inference, evaluate}
debug: True #False # will print the complete yaml config plus run a validation test
8 changes: 8 additions & 0 deletions config/tiling/default_tiling.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# @package _global_
tiling:
tiling_data_dir: ${general.tiling_data_dir}
train_val_percent: {'trn':0.7, 'val':0.3, 'tst':0}
chip_size: 32
overlap_size:
min_annot_perc: 1
use_stratification: False
10 changes: 10 additions & 0 deletions dataset/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,16 @@ To support both single-band and multi-band imagery, the path in the first column
### 1. Path to a multi-band image file:
`my_dir/my_multiband_geofile.tif`

A particular order or subset of bands in multi-band file must be used by setting a list of specific indices:

#### Example:

`bands: [3, 2, 1]`

Here, if the original multi-band raster had BGR bands, geo-deep-learning will reorder these bands to RGB order.

The `bands` parameter is set in the [dataset config](../config/dataset/test_ci_segmentation_multiclass.yaml).

### 2. Path to single-band image files, using only a common string
A path to a list of single-band rasters can be inserted in the csv, but only a the string common to all single-band files should be considered.
The "band specific" string in the file name must be in a [hydra-like interpolation format](https://hydra.cc/docs/1.0/advanced/override_grammar/basic/#primitives), with `${...}` notation. The interpolation string completed during execution by a dataset parameter with a list of desired band identifiers to help resolve the single-band filenames.
Expand Down
45 changes: 31 additions & 14 deletions dataset/aoi.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
from torchvision.datasets.utils import download_url
from tqdm import tqdm

from utils.geoutils import stack_singlebands_vrt, is_stac_item, create_new_raster_from_base
from utils.geoutils import stack_singlebands_vrt, is_stac_item, create_new_raster_from_base, subset_multiband_vrt
from utils.logger import get_logger
from utils.utils import read_csv
from utils.verifications import assert_crs_match, validate_raster, \
Expand Down Expand Up @@ -51,8 +51,8 @@ def __init__(
self.item = item
self._assets_by_common_name = None

if bands_requested is not None and len(bands_requested) == 0:
logging.warning(f"At least one band should be chosen if assets need to be reached")
if not bands_requested:
raise ValueError(f"At least one band should be chosen if assets need to be reached")

# Create band inventory (all available bands)
self.bands_all = [band for band in self.asset_by_common_name.keys()]
Expand Down Expand Up @@ -183,7 +183,10 @@ def __init__(self, raster: Union[Path, str],
self.raster_stac_item = None

# If parsed result has more than a single file, then we're dealing with single-band files
self.raster_src_is_multiband = True if len(raster_parsed) == 1 else False
if len(raster_parsed) == 1 and rasterio.open(raster_parsed[0]).count > 1:
self.raster_src_is_multiband = True
else:
self.raster_src_is_multiband = False

# Download assets if desired
self.download_data = download_data
Expand All @@ -203,8 +206,8 @@ def __init__(self, raster: Union[Path, str],
self.raster_parsed = raster_parsed

# if single band assets, build multiband VRT
self.raster_to_multiband(virtual=True)
self.raster_read()
self.src_raster_to_dest_multiband(virtual=True)
self.raster_open()
self.raster_meta = self.raster.meta
self.raster_meta['name'] = self.raster.name
if self.raster_src_is_multiband:
Expand Down Expand Up @@ -297,8 +300,8 @@ def __init__(self, raster: Union[Path, str],
)
if len(self.label_gdf_filtered) == 0:
logging.warning(f"\nNo features found for ground truth \"{self.label}\","
f"\nfiltered by attribute field \"{self.attr_field_filter}\""
f"\nwith values \"{self.attr_values_filter}\"")
f"\nfiltered by attribute field \"{self.attr_field_filter}\""
f"\nwith values \"{self.attr_values_filter}\"")
else:
self.label_gdf_filtered = None

Expand Down Expand Up @@ -347,7 +350,6 @@ def from_dict(cls,
)
return new_aoi

# TODO: is this necessary if to_dict() is good enough?
def __str__(self):
return (
f"\nAOI ID: {self.aoi_id}"
Expand All @@ -359,16 +361,29 @@ def __str__(self):
f"\n\tAttribute values filter: {self.attr_values_filter}"
)

def raster_to_multiband(self, virtual=True):
def src_raster_to_dest_multiband(self, virtual=True):
"""
Outputs a multiband raster from multiple sources of input raster
E.g.: multiple singleband files, single multiband file with undesired bands, etc.
"""
if not self.raster_src_is_multiband:
if virtual:
self.raster_multiband = stack_singlebands_vrt(self.raster_parsed)
else:
self.raster_multiband = self.write_multiband_from_singleband_rasters_as_vrt()
elif self.raster_src_is_multiband and self.raster_bands_request:
if not all([isinstance(band, int) for band in self.raster_bands_request]):
raise ValueError(f"Use only a list of integers to select bands from a multiband raster.\n"
f"Got {self.raster_bands_request}")
if len(self.raster_bands_request) > rasterio.open(self.raster_raw_input).count:
raise ValueError(f"Trying to subset more bands than actual number in source raster.\n"
f"Requested: {self.raster_bands_request}\n"
f"Available: {rasterio.open(self.raster_raw_input).count}")
self.raster_multiband = subset_multiband_vrt(self.raster_parsed[0], band_request=self.raster_bands_request)
else:
self.raster_multiband = self.raster_parsed[0]

def raster_read(self):
def raster_open(self):
self.raster = _check_rasterio_im_load(self.raster_multiband)

def to_dict(self, extended=True):
Expand Down Expand Up @@ -509,8 +524,10 @@ def parse_input_raster(
raster = [value['meta'].href for value in item.bands_requested.values()]
return raster
elif "${dataset.bands}" in csv_raster_str:
if not isinstance(raster_bands_requested, (List, ListConfig, tuple)) or len(raster_bands_requested) == 0:
raise TypeError(f"\nRequested bands should a list of bands. "
if not raster_bands_requested \
or not isinstance(raster_bands_requested, (List, ListConfig, tuple)) \
or len(raster_bands_requested) == 0:
raise TypeError(f"\nRequested bands should be a list of bands. "
f"\nGot {raster_bands_requested} of type {type(raster_bands_requested)}")
raster = [csv_raster_str.replace("${dataset.bands}", band) for band in raster_bands_requested]
return raster
Expand Down Expand Up @@ -593,7 +610,7 @@ def aois_from_csv(
@param csv_path:
path to csv file containing list of input data. See README for details on expected structure of csv.
@param bands_requested:
List of bands to select from inputted imagery. Applies only to single-band input imagery.
List of bands to select from inputted imagery
@param attr_values_filter:
Attribute filed to filter features from
@param attr_field_filter:
Expand Down
2 changes: 1 addition & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ dependencies:
- docker-py>=4.4.4
- geopandas>=0.10.2
- h5py>=3.7
- hydra-core>=1.1.0
- hydra-core>=1.2.0
- pip
- pystac>=0.3.0
- pytest>=7.1
Expand Down
5 changes: 0 additions & 5 deletions inference_segmentation.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,6 @@ def segmentation(param,
chunk_size: int,
device,
scale: List,
BGR_to_RGB: bool,
tp_mem,
debug=False,
):
Expand All @@ -152,7 +151,6 @@ def segmentation(param,
chunk_size: image tile size
device: cuda/cpu device
scale: scale range
BGR_to_RGB: True/False
tp_mem: memory temp file for saving numpy array to disk
debug: True/False
Expand Down Expand Up @@ -192,7 +190,6 @@ def segmentation(param,
sample['metadata'] = image_metadata
totensor_transform = augmentation.compose_transforms(param,
dataset="tst",
input_space=BGR_to_RGB,
scale=scale,
aug_type='totensor',
print_log=print_log)
Expand Down Expand Up @@ -341,7 +338,6 @@ def main(params: Union[DictConfig, dict]) -> None:
# Default input directory based on default output directory
raw_data_csv = get_key_def('raw_data_csv', params['inference'], default=working_folder,
expected_type=str, to_path=True, validate_path_exists=True)
BGR_to_RGB = get_key_def('BGR_to_RGB', params['dataset'], expected_type=bool)

# LOGGING PARAMETERS
exper_name = get_key_def('project_name', params['general'], default='gdl-training')
Expand Down Expand Up @@ -403,7 +399,6 @@ def main(params: Union[DictConfig, dict]) -> None:
chunk_size=chunk_size,
device=device,
scale=scale,
BGR_to_RGB=BGR_to_RGB,
tp_mem=temp_file,
debug=debug)

Expand Down
Loading

0 comments on commit da034c9

Please sign in to comment.