Authors: Marta Elvira, Roberto Calvo, Javier Becerra. PANOimagen S.L.
This repository contains a script for EO Browser, specifically designed to visualize Sentinel-2 13 band data in a way that facilitates differentiation of urban areas (red channel), vegetation areas (green channel) and water areas (blue channel). The coefficients used to combine the original 13 bands for each channel have been obtained using three LDA classifiers (i.e. one per channel).
- Show script.
- See in EO Browser.
Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are two machine learning techniques commonly used to reduce the dimensionality of large data sets and variables, the result of both techniques being a linear combination of the data. PCA is used to decompose a multivariate dataset in a set of successive orthogonal components that explain a maximum amount of the variance in the data. LDA is a supervised version of PCA, which maximizes the separation between given classes.
We have used LDA to create a visualization where each image channel (red, green and blue) codes the maximum information to identify respectively urban, crop and water related classes. Input class labels where taken from Spanish SIOSE land use classification. We have thus created three different transformations using LDA, one per component. Finally, as we have used a multiclass classifier for each type of data (for instance, urban data is separated into non urban, urban, industrial...), we transform the new axis obtained in order to translate the center of non-urban
(resp. non-vegetation
and non-water
) classes to 0, and get the absolute value to recover any of the urban (resp. vegetation, water) classes that might have passed to the negative axis.
The obtained transformation is visually attractive, and allows easy differentiation of urban/crop/water areas on Sentinel 2 images.
We have created our model in Python using LDA method of the scikit-learn library [1]. For the input data, we have taken around 80 images of Sentinel-2 L1C and land use classification from Siose. All the images where taken from the Spanish region of La Rioja (with dates ranging from 2015 to 2019). We have tested the estimated model in different world regions (as you can see in our collected examples) and find it to be visually satisfying, even though the model has been trained only with images from a small geographical region.
The script consist of three components (RGB), each component is created by applying a different LDA to reduce the data to a single dimension for the corresponding color channel.
The first component has the data of urban areas. We selected bands usually present when visualizing urban areas (B02, B03, B04, B11 and B12) [2]. The classes we use are city
, industry
, roads
and non-urban
.
The second component collects data from vegetation areas. The bands used to calculate vegetation indexes such as NDVI, false infrared color, SAVI, BSI, etc., are B02, B03, B04, B08, B8A and B11 (see in [2] and [3]). The classes we use are crops
, forests
, urban green areas
, arable land
and non-vegetation
.
Finally, the third component has the data of water zones. We use the bands B03, B04, B08 and B11, which are the ones use to calculate indexes such as NDSI (snow), NDWI (water), and NDGI (glaciers) [3]. The classes we used are rivers
, lakes
and seas
and non-water
.
Before applying LDA we balance the classes, that is, we randomly choose the same number of pixels for each class. Therefore, we have used 4409380, 2128230, 41856 pixels to create each component, respectively. After applying LDA, we center the LDA result (in the range 0-1) and adjust contrast, so that the "non-class" values of each component are zero (using a linear transformation and calculating the absolute value of each component).
Given the way that the script is designed, it is expected that the urban areas will appear in red, the vegetation in green tones and the areas of water in blue, which appears to be the standard behaviour of the script. However we also see the appearance of red in certain crops, and some rivers might also appear in tones other than blue.
In general, the colors for each zone are:
- Urban areas: pink, orange.
- Industrial: brigth purple.
- Crops: purple and bright green tones.
- Forests: dark greens.
- No vegetation: yellow and pink.
- Water: blue.
- Snow: white.
Below we show an image of Madrid on 26-09-2019. More images can be found in the examples page.
[1] Scikit-learn, Linear Discriminant Analysis (LDA) . Accessed on November 2019.
[2] GitHub repository, Collection of custom scripts. Accessed on November 2019.
[3] List of spectral indexes for Sentinel and Landsat. Accessed on December 2019.
[4] Borràs, J. & Delegido, Jesús & Pezzola, Alejandro & Pereira, M. & Morassi, G. & Camps-Valls, Gustau. (2017). Clasificación de usos del suelo a partir de imágenes Sentinel-2. Revista de Teledetección. Accessed on November 2019.
[5] Pirotti, Francesco & Sunar, Filiz & Piragnolo, M.. (2016). BENCHMARK OF MACHINE LEARNING METHODS FOR CLASSIFICATION OF A SENTINEL-2 IMAGE. ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Accessed on December 2019.
[6] Anaya Isaza, Andres & Peluffo, Diego & Alvarado Pérez, Juan & Rios, Jorge & Castro Silva, Juan Antonio & Rosero, Paul & Peña, Diego & Salazar Castro, Jose & Umaquinga, Ana. (2016). Estudio comparativo de métodos espectrales para reducción de la dimensionalidad: LDA versus PCA. Accessed on December 2019.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.