-
Notifications
You must be signed in to change notification settings - Fork 50
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Common input pipeline for single- and multi-band imagery and AOI obje…
…ct for input data (#309) * update csv to expect 3 mandatory columns and one optional. See comments in issue #221 * use inference data for binary segmentation in tests/, not data/ * environment.yml: hardcode setuptools version because of pytorch bug * environment.yml: set correct subversion to setuptools * environment.yml: move setuptools from conda section to pip * sampling_segmentation.py: implement AOI class verifications.py: update assert_crs_match function, add validate functions for rasters and vector files * remove support for AWS bucket via boto3 * finish draft of sampling with AOI objects (with basic validation), rather than from raw csv lines * environment.yml: fix and update * environment.yml: add issue link for setuptools * environment.yml: add issue link for setuptools * environment.yml: fix and update * environment.yml: add issue link for setuptools * environment.yml: add issue link for setuptools * sampling_segmentation.py: implement AOI class verifications.py: update assert_crs_match function, add validate functions for rasters and vector files * finish draft of sampling with AOI objects (with basic validation), rather than from raw csv lines * train_segmentation.py: add warning for debugging and skip save checkpoint if val loss is None * tests/data/massachusetts: restore larger format to prevent val_loss=None * tests/data/massachusetts...: switch back to smaller image test_ci_segmentation_binary.yaml: tile images to 32, not 256 test_ci_segmentation_multiclass.yaml: idem train_segmentation.py: raise ValueError for empty train or val dataloader * aoi.py: - create an AOI object with input validation. AOI would be the core input for tiling, training and inference, though only yet implemented for tiling. - add stac item support geoutils.py: - add utils: is_stac_item, stack_vrts() for create artificial multi-band raster from single-bands files test_aoi.py: add first test for parsing raster input from 3 types to a single rasterio.RasterDataset object default.yaml: activate debug functionality for logging test_ci_segmentation_multiclass.yaml: replace 'modalities' with 'bands' key test_ci_segmentation_binary.yaml: idem sample_creation.py: delete utils.py: remove validation from read_csv() function. * inference_segmentation.py: remove read_modalities README.md: start updating * evaluate_segmentation.py: fix bug (remove read_modalities()) dataset/README.md: add documentation on input data configuration and csv format README.md: update * aoi.py: - add write multiband fonction for demo and debugging - move aois_from_csv from sampling_segmentation.py * aoi.py: remove circular import automatically created py Pycharm * fix typos and potential bugs introduced by Pycharm's automatic refactoring * aoi.py: use pre-existing raster validation function sampling_segmentation.py: move validation to aoi object utils.py: finish removing AWS bucket feature verifications.py: - update all data validation functions * inference_segmentation.py: remove bucket parameter in list_input_images * test_aoi.py: use local stac item (prevent timeout error at CI)
- Loading branch information
Showing
20 changed files
with
640 additions
and
784 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# Input data | ||
The sampling and inference steps requires a csv referencing input data. An example of input csv can be found in [tests](tests/sampling/sampling_segmentation_binary_ci.csv). | ||
Each row of this csv is considered, in geo-deep-learning terms, to be an [AOI](https://torchgeo.readthedocs.io/en/latest/user/glossary.html#term-area-of-interest-AOI). | ||
|
||
| raster path | vector ground truth path | dataset split | aoi id (optional) | | ||
|---------------------------|--------------------------|---------------|-------------------| | ||
| my_dir/my_geoimagery1.tif | my_dir/my_geogt1.gpkg | trn | Ontario-1 | | ||
| my_dir/my_geoimagery2.tif | my_dir/my_geogt2.gpkg | tst | NewBrunswick-23 | | ||
| ... | ... | ... | ... | | ||
|
||
> The use of aoi id information will be implemented in a near future. It will serve, for example, to print a detailed report of sampling, training and evaluation, or for easier debugging. | ||
The path to a custom csv must be entered in the [dataset configuration](https://github.com/NRCan/geo-deep-learning/blob/develop/config/dataset/test_ci_segmentation_binary.yaml#L9). See the [configuration documentation](config/README.md) for more information. | ||
Also check the [suggested folder structure](https://github.com/NRCan/geo-deep-learning#folder-structure). | ||
|
||
## Dataset splits | ||
Split in csv should be either "trn", "tst" or "inference". The validation split is automatically created during sampling. It's proportion is set by the [dataset config](https://github.com/NRCan/geo-deep-learning/blob/develop/config/dataset/test_ci_segmentation_binary.yaml#L8). | ||
|
||
## Raster and vector file compatibility | ||
Rasters to be used must be in a format compatible with [rasterio](https://rasterio.readthedocs.io/en/latest/quickstart.html?highlight=supported%20raster%20format#opening-a-dataset-in-reading-mode)/[GDAL](https://gdal.org/drivers/raster/index.html) (ex.: GeoTiff). Similarly, labels (aka annotations) for each image must be stored as polygons in a [Geopandas compatible vector file](Rasters to be used must be in a format compatible with [rasterio](https://rasterio.readthedocs.io/en/latest/quickstart.html?highlight=supported%20raster%20format#opening-a-dataset-in-reading-mode)/[GDAL](https://gdal.org/drivers/raster/index.html) (ex.: GeoTiff). Similarly, labels (aka annotations) for each image must be stored as polygons in a [Geopandas compatible vector file](https://geopandas.org/en/stable/docs/user_guide/io.html#reading-spatial-data) (ex.: GeoPackage). | ||
) (ex.: GeoPackage). | ||
|
||
## Single-band vs multi-band imagery | ||
|
||
To support both single-band and multi-band imagery, the path in the first column of an input csv can be in **one of three formats**: | ||
|
||
### 1. Path to a multi-band image file: | ||
`my_dir/my_multiband_geofile.tif` | ||
|
||
### 2. Path to single-band image files, using only a common string | ||
A path to a list of single-band rasters can be inserted in the csv, but only a the string common to all single-band files should be considered. | ||
The "band specific" string in the file name must be in a [hydra-like interpolation format](https://hydra.cc/docs/1.0/advanced/override_grammar/basic/#primitives), with `${...}` notation. The interpolation string completed during execution by a dataset parameter with a list of desired band identifiers to help resolve the single-band filenames. | ||
|
||
#### Example: | ||
|
||
In [dataset config](../config/dataset/test_ci_segmentation_binary.yaml): | ||
|
||
`bands: [R, G, B]` | ||
|
||
In [input csv](../tests/sampling/sampling_segmentation_binary_ci.csv): | ||
|
||
| raster path | ground truth path | dataset split | | ||
|------------------------------------------------------------|-------------------|---------------| | ||
| my_dir/my_singleband_geofile_band_**${dataset.bands}**.tif | gt.gpkg | trn | | ||
|
||
During execution, this would result in using, **in the same order as bands appear in dataset config**, the following files: | ||
`my_dir/my_singleband_geofile_band_R.tif` | ||
`my_dir/my_singleband_geofile_band_G.tif` | ||
`my_dir/my_singleband_geofile_band_B.tif` | ||
|
||
> To simplify the use of both single-band and multi-band rasters through a unique input pipeline, single-band files are artificially merged as a [virtual raster](https://gdal.org/drivers/raster/vrt.html). | ||
### 3. Path to a Stac Item | ||
> Only Stac Items referencing **single-band assets** are supported currently. See [our Worldview-2 example](https://datacube-stage.services.geo.ca/api/collections/spacenet-samples/items/SpaceNet_AOI_2_Las_Vegas-056155973080_01_P001-WV03). | ||
Bands must be selected by [common name](https://github.com/stac-extensions/eo/#common-band-names) in dataset config: | ||
`bands: ["red", "green", "blue"]` | ||
|
||
> Order matters: `["red", "green", "blue"]` is not equal to `["blue", "green", "red"]` ! |
Oops, something went wrong.