Skip to content

Latest commit

 

History

History
264 lines (221 loc) · 22.5 KB

README.md

File metadata and controls

264 lines (221 loc) · 22.5 KB

Estimation Tool for Spatial Prediction Models

Table of contents

Authors

Project of the course Geosoftware 2 at the Institute of Geoinformatics by Jakob Danel, Fabian Schumacher, Thalis Goldschmidt, Henning Sander and Frederick Bruch

Abstract

Machine learning methods have become very popular for spatial prediction efforts such as classifying remote sensing images, especially because of their ability to learn non-linear relationships and thereby solve more complex classifications tasks. A underestimated issue is that machine learning algorithms can only provide meaningful predictions when applied to data that is similar to the data they were trained on (Meyer and Pebesma, 2021). ”Similar” here refers to the value ranges of the predictor variables (such as different bands of the remote sensing image). When applying a trained machine learning algorithm to a new geographic area, it is unclear whether or not the pixels properties in that area are similar enough to the training data to enable a reliable classification.

Area Of Applicability (AOA)

The Area Of Applicability is a method developed by Meyer and Pebesma (2021) to delineate areas in spatial data (here remote sensing images) that can be assumed to be areas the machine learning model can reliably be applied to. The AOA provides important additional information that should be communicated when applying machine learning methods to spatial prediction tasks, especially when predicting on a large or even global scale when training data are not evenly distributed over the target area.

Aim of the tool

The tool combines all the steps needed to perform a land use/land cover classification (generation of satellite images, model training and prediction). In particular, it is designed to extend the previous steps by the AOA and adopt this method into the typical workflow of a remote scientist/researcher without having to deal with its concrete implementation. Besides delineating such an area of applicability (AOA), this tool can also be used to point to areas where collecting additional training data is needed to train a more applicable model.

Target group

Researchers and users of remote sensing methods who want to

  • use machine learning for land use classifications
  • work with sentinel-2 data
  • know how to train and apply machine learning models, but are unable or unwilling to focus on understanding and implementing the Area of Applicability
  • work with large-scale mapping/modeling applications, but lack the necessary hardware to perform machine learning

How does the software work?

The user has the possibility to select a model to work with. He can either upload his own model via an upload button or create a new model in order to train it with a selectable machine-learning algorithm. Depending on his choice, only specific parts of the software will be executed.

Input

  • Area of interest: The area for which the land use classification and the aoa are to be calculated.
  • Training data or model: If a new model should be created, training data must be uploaded. Otherwise a model has to be uploaoded by the user.
  • Machine learning algorithm and hyperparameters: The new model must be trained. For this, the user can choose between two machine-learning algorithms and, if desired, also pass hyperparameters.
  • Time period: In that period, a search is made for available sentinel-2 images.
  • Bands/ predictors: All bands/predictors to be included in the sentinel images.
  • Resolution: Resolution of the sentinel images to be generated.
  • Maximum cloud cover: The satellite imagery search is filtered by maximum cloud cover.

Part 1: Satellite image generation (with R)

Generation of a Sentinel-2 satellite image for the area of interest (Sentinel Image (AOI))

  • Based on the user inputs (area of interest (AOI) , time period and cloud cover), the Spatial Temporal Asset Catalog (STAC) is searched for matching Sentinel-2 satellite images.
  • For each Sentinel-2 image found, all bands (except B10) are available for download. We only continue to work with those that have been pre-selected by the user.
  • If many images are found, we limit ourselves to 400 for further calculations.
  • All images (max 400) are now superimposed and for each pixel the median is calculated over all images for each band.
  • This can be helpful to avoid the problem of cloud cover and other interfering factors. In other words, the more images that can be found, the more likely it is to get a good image for model training and LULC classification.

Generation of a Sentinel-2 satellite image for the areas where the training data is located (Sentinel Image (training area))

  • The generation of a Sentinel-2 satellite image for the areas where the training data is located is only done if the user chose to create a new model and therefore has uploaded training data.
  • It works analogously to the generation of the Sentinel-2 image for the AOI. Instead of filtering by the AOI, it filters by the geometry of the training polygons. Pixels outside the polygons are set to NA.

Part 2: Calculation of indices (with R)

Additional indices can only be checked if the necessary bands for the calculations have also been selected. Then they are calculated and also used as predictors for further model training.

  • Available indices:
    • NDVI, NDVI_sd_3x3, NDVI_sd_5x5
    • BSI
    • BAEI

Part 3: Model training (with R)

If the user selects to work with his own model, no further model training is needed. If the user selects to create a new model, some additional steps must be performed to obtain valid training data. The generated sentinel image of the training areas (consisting of all selected bands) is now combined with the information from the uploaded training data. Each pixel completely covered by a training polygon is assigned the class of the polygon. As a result we get a dataset of all overlaid pixels, their assigned class and spectral information that we can now use to train the model.

The user can choose whether he wants to train the model with an random forest algorithm or with a support vector machine. For both, hyperparameters can be set. The models performance is validated with a spatial cross validation method, omitting whole training polygons.

Part 4: Prediction and AOA (with R)

With the help of the trained model and the generated sentinel image for the AOI, a prediction is now calculated. In order to be able to make statements about the applicability of the model especially on unknown areas, the AOA is computed. In the areas where the model is not applicable according to the AOA, random points are generated that are suggested to the user as potential new locations for generating new training data. If this data is acquired in these areas and incorporated into the model, better results could be obtained.

How to install and run the app

To make it as simple as possible we used Docker for the development. The only thing necessary to run this software, is to download this repository with git clone --recursive https://github.com/geo-tech-project/geotech.git and then run sudo docker-compose up in the command line interface. This command loads two images, one for the front and one for the backend, via dockerhub. It may take a while to load the images, as all dependencies (e.g., R packages) are being loaded as well. After loading, the application will start automatically. It is accessible over your own IP-adress and the :8780 port. Example: http://localhost:8780 or for our AWS instance: http://35.80.3.64:8780.

How to use the app

Main tool

The main tool is designed in such a way that the user can use it very easily. The user is guided step by step and can only proceed to the next step if the previous one has been carried out correctly. For each step there is an additional info button that displays important information as soon as you hover over it. When everything has been entered successfully, the calculations can be started. After the calculations have been executed and no errors have occurred, the user will be directed to the results page. Main Tool page

Demo

The demo page is structured exactly like the actual tool. However, all inputs have already been entered with default values. The user can view these entries, but not change them. He is only able to start the calculations by clicking on the Run demo button. The user should be redirected to the results page in less than 20 seconds. Demo page

Output of the results

On a new route, the following three results are visualised on a map:

  • Prediction: Land use/ land cover classification
  • Area of Applicabilty (AOA)
  • Further train areas

It is possible to show and hide the individual results using a checkbox and even to adjust their transparency. The underlying satellite images on which the calculations are based are not displayed on the map but can be downloaded in the same way as the other results via a download button. Please note that the sentinel image of the training areas can only be downloaded if training data has been submitted.

Unfortunately, the sentinel images do not contain any band names. However, they correspond to the order in which they can be selected. Example: Bands B03, B07, B05 and the addtional index BSI have been selected.
Then the order in the Tif would be as follows: 1 = B03, 2 = B05, 3 = B07, 4 = BSI. Result page

How to test

To test this app you can proceed as follows:
Backend:
With your CLI go into your backend folder and run npm test.
Frontend:
With your CLI go into your frontend folder and run ng test.
R: The tests are written in the R package testthat.
Requirements:

  • Installation of R
  • Installation all R packages used in this project
  • Installation of Node.js

Proceed the following steps.

  1. Make a clone of the backend repository
  2. Navigate into the backend/test folder
  3. Run node testR.js

Dependencies

The following packages are used in this project:

Frontend

Dev dependencies

Backend

  • axios: Promise based HTTP client for the browser and node.js
  • body-parser: Node.js body parsing middleware
  • chai: BDD/TDD assertion library for node.js and the browser. Test framework agnostic.
  • cors: Node.js CORS middleware
  • dotenv: Loads environment variables from .env file
  • express: Fast, unopinionated, minimalist web framework
  • mocha: simple, flexible, fun test framework
  • multer: Middleware for handling multipart/form-data.
  • ng2-file-upload: extension for multer to upload files to the server
  • nodemon: Simple monitor script for use during development of a node.js app.
  • r-integration: Simple portable library used to interact with pre-installed R compiler by running commands or scripts(files)
  • supertest: SuperAgent driven library for testing HTTP servers
  • swagger-ui-express: Swagger UI Express

R

  • terra: Spatial Data Analysis
  • rgdal: Bindings for the 'Geospatial' Data Abstraction Library
  • rgeos: Interface to Geometry Engine - Open Source ('GEOS')
  • rstac: Client Library for SpatioTemporal Asset Catalog
  • gdalcubes: Earth Observation Data Cubes from Satellite Image Collections
  • raster: Geographic Data Analysis and Modeling
  • caret: Classification and Regression Training
  • CAST: 'caret' Applications for Spatial-Temporal Models
  • lattice: Trellis Graphics for R
  • Orcs: Omnidirectional R Code Snippets
  • jsonlite: A Simple and Robust JSON Parser and Generator for R
  • tmap: Thematic Maps
  • latticeExtra: Extra Graphical Utilities Based on Lattice
  • doParallel: Foreach Parallel Adaptor for the 'parallel' Package
  • parallel
  • sp: Classes and Methods for Spatial Data
  • geojson: Classes for 'GeoJSON'
  • rjson: JSON for R
  • randomForest: Breiman and Cutler's Random Forests for Classification and Regression

Further documentation

The software can be split into two essential parts. The frontend was developed with the web framework Angular. The backend is setup as a Node.js application using the Express framework.

Frontend

Documentation of the frontend written in Angular, with HTML, CSS and TypeScript: Frontend

Backend

The backend can be devided into three parts. The first part are the R scripts that are used to perform the actual operations, e.g. generating the sentinel images or calculating the AOA. The second part is the API that establishes the connection between the back- and frontend. The last part is the Javascript code that sets up the API and connects to the R-part. Please note that the following links can only be used from the internet network of the University of Münster.

License

Copyright (C) 2022 Henning Sander, Frederick Bruch, Jakob Danel, Fabian Schumacher, Thalis Goldschmidt

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.