Workflow
the data we are using and how we process the analysis
Step 0 - Data Pronuncement
Pronuncement of the data we used for the analysis: DUSAF, DTM, NVDI, Distance Characterization Data, and Landslides Inventory.
Click the ⏵ toggle list to check the data details
DUSAF stands for digital urban surface analysis factors, which is a raster that includes different factors about buildings, roads and other factors, provides land use and landcover information.By taking DUSAF into the source data of sampling the training dataset, we can see how human activities change the natural environment and influence the risk of landslides.
Normalized Difference Vegetation Index (NDVI) is an indicator of the degree of greenness of a biome, which is not a physical attribute but is widely used to monitor ecosystems.The NDVI can be calculated using the formula:
\( \text{NDVI} = \frac{\text{NIR} - \text{RED}}{\text{NIR} + \text{RED}} \)NIR and RED are the spectral reflectances measured in the near-infrared and red wavebands, respectively. This index ranges from -1 to +1, with higher values indicating healthier and denser vegetation.NDVI plays a rather important role in landslide susceptibility mapping, since the vegetation cover indicated by the NDVI can affect the slope stability, thus eventually affecting the output of analyzing landslide.
DTM is the raster data representing the surface of the terrain, known as Digital Terrain Models (DTMs), consisting of XYZ coordinates that provide detailed elevation information. This data shows the elevation and facilitates the calculation of various topographic attributes such as slope, aspect, plan and profile curvature, and slope angle. These attributes are critical in many geospatial analyses, including landslide susceptibility mapping. In our case, we need to use the derived layers mentioned above to calculate the non landslide zones for training.
NVDI
Because we have missing NVDI data in some areas, we extracted this area during preprocessing and removed it when generating the training set later.
DUSAF
DTM
The data related to distance, including road, river network and distance to geological faults have important impacts on landslides, which needs to be taken into consideration when predicting landslides.
obtained from OpenStreetMap as vector data, are analyzed by creating buffer zones at distances of 50, 100, 250, 500 meters, and beyond. These buffer zones help in understanding the influence of roads on nearby slopes.
mapped as vector data from OpenStreetMap, are analyzed by creating buffer zones at distances of 50, 100, 250, 500 meters, and more. These buffer zones help in understanding the impact of rivers on slopes since rivers can cause erosion at the base of slopes, which can eventually lead to landslides.
can create zones of weakness in the earth's crust, making them more susceptible to landslides. We use the geodetic faults data from GeoPortale Lombardia, which are analyzed by creating buffer zones at distances of 50, 100, 250, 500 meters, and more. These buffer zones help in evaluating the impact of faults on surrounding areas.
Landslides Inventory in the area of Group 7
In this case study, we used landslide data from the past as part of the training data ,since ”the past and present are the key to the future”. By definition, landslide inventory data should include the type of landslide, the time of landslide occurrence, the status activity and the depth of landslides.We use data from the IFFI catalog, with each landslide represented as a point ,a line or a polygon.When representing a landslide as a point, the point is located at the landslide crown. When the surface of landslide data can be mapped, it can be presented by a line or adopt the survey scale.In the case of polygon, debris flows and translational or rotational slides and all linear landslides are included.
Data Preprocessing
Data preprocessing includes the reprojection, clipping, resampling, rasterization
During data preprocessing, we must ensure that the coordinate reference system, extent, and resolution of all layers are consistent with the DTM layer to obtain data consistency when generating the training set.
Coordinate Reference System (CRS) | EPSG:32632 - WGS 84 / UTM zone 32N | Pixel Resolution | 5 Meters | Extent | Vector Polygon of the area of Group 7 |
---|
Step 1 - Data Processing
QGIS Analysis: Slope, Aspect, Plan Curvature and Profile Curvature
To define the no landslide zones,we need to extract 4 factors from the DTM, which are:slope angle,aspect,plane and profile curvature. To extract these factors,we use the Processing-SAGA-Slope, aspect, curvature provided in QGIS,and set the units as degrees.
Slope is the incline of a slope, representing how much elevation changes over a certain horizontal distance.Slope angle is the angle between the slope surface and the horizontal plane, measured in degrees. It quantifies the steepness of the slope.
Describe the direction of a slope. how much elevation changes over a certain horizontal distance.
The curvature of the surface in a horizontal plane.It indicates the rate of change of aspect along a contour line and affects water flow and erosion.
The curvature of the surface in a vertical plane. It reflects the rate of change of slope along a flow line and influences the acceleration and deceleration of water flow.
Define the No Landslide Zones (NLZ)
No landslide zone refers to the areas with low possibility of landslides. In our case study,we adopt the simplified no landslide zone definition derived from slope angle: areas with slopes below 20 degrees or above 70 degreesare generally less prone to landslides.
- To obtain the NLZ layer, we follow these three steps
- We use the raster calculator on the DTM layer to compute "slope@1" < 20 OR "slope@1" > 70 . In the resulting raster layer, No Landslide Zones correspond to pixels with a value of 1.
- Use the r.null to remove null value in the resulting raster layer,and use the Processing-GDAL-Raster analysis-Sieve tool to filter the raster,in order to remove the small patches in the resulting raster.We tried three different filter thresholds: 10, 30, 50 and 70. For our study area, a threshold of 30 works best.
- Vectorize the resulting raster to obtain the polygons of NLZ. To achieve this ,we use the Processing-GRASS-Raster-r.to.vect tool, set the raster values as categories.
Final Vectorized No Landslide Zones (NLZ)
Sieved with threshold 10
too much useless small scattered points
Sieved with threshold 30
removed most of excessively small scatter points
Sieved with threshold 50
relatively larger blocks have been retained
Sieved with threshold 70
filtered out excess blocks
Combine NLZ and LZ Dataset
- Different
Since we adopt the simplified definition of non landslide zones, the non landslide zones may overlap with the landslide inventory polygons.We use the Processing-Vector Overlay-Difference tool to remove the overlapping part of two dataset. - Define 'Hazard'
To prepare the training/testing dataset ,we create a new field 'Hazard' in the attribute tables of both Landslide Inventory and NLZ,and assign 1 to the NLZ,2 to the Landslide Inventory. - Union
After this,perform union operation on the Landslide Inventory polygons. - Manual Intervention
For the purpose of ensuring the training and validation data are evenly distributed, we manually modified the vector layer produced after the union operation. For a detailed description of this issue, see the "Problem Encountered" section.
Hazard Value | |
---|---|
Landslides Inventory | 1 |
No Landslide Zones | 2 |
Separation of Train_Test Dataset
- The procedure for creating the training data is as follows:
- For both landslide inventory and NLZ layers,define a training-testing ratio that will be used for the machine learning model.Here we tried both 80/20 and 70/30 dataset. Use Processing-Vector Selection-Random Selection,to randomly select polygons according to the given ration, and invert the selection to switch between training and testing polygons.
- Merge the processed Landslide Inventory layer with the one of NLZ.
- After determining how many points are needed for the training data,we need to select points within the polygons based on the training and validation set ratios while ensuring a 1:1 point ratio according to the 'Hazard' value.Thus,we use the Select Features by Value tool and select according to the 'Hazard' and 'Train_Test' field. After the selection,use Processing-Vector Creation-Random Points in layer bounds. Use the merged Landslide Inventory and NLZ layer and select selected feature only.
- Merge separately the training and testing layers into two point layers trainingPoints and testingPoints.
1000 points with 70/30 - Train/Test Ration
Hazard Value | Numbers of Points | |
Train | 1 | 350 |
Train | 2 | 350 |
Test | 1 | 150 |
Test | 2 | 150 |
Step 2 - Susceptibility Mapping Process
Hazard classification using dzetsaka
- Training procedure
Before training,we need to remove NULL value in all attributes of datasets by using the Field Calculator tool to update the value of each attribute column, replacing null values with 9999. We use the plugin dzetsaka for classifying hazard. For each group of training/testing data, we choose Random Forest as classifier,and generate a virtual raster layer containing all the raster layers for training. - Probability Map
To describe the probability of landslide occurrence and create a probability map, we need to convert classification confidence into class probability by using the raster calculator,the input is as follows:
("classification@1 "=2)*"confidence@1 "+ ("classification@1 "=1)*(100-"confidence@1 ")
Thus,we obtain the derived probability map from the training result.
Step 3&4 - Exposure assessment
Population Analysis
- Data preprocessing
For exposure assessment,we use the worldpop raster data, which is a spatial raster dataset with the estimated total number of people per grid-cell. We adopted the same processing workflow as in the data preprocessing stage. First reproject the population raster to WGS 84/ UTM Zone 32N,EPSG:32632 to maintain consistency of the CRS between datasets.Then clip the population raster layer using the mask vector of group 7,to obtain the population raster for exposure assessment. - Ranking the Degree
For ranking the degree of exposure,we need to reclassify the susceptibility raster map into 4 classes
We use the classify by table tool to perform classification. Then we resample the reclassified susceptibility raster map using extension and resolution of the clipped population raster dataset,which is around 81.67 m .[0, 0.25) low [0.25, 0.5) moderate [0.5, 0.75) high [0.75, 1) very high - Analysis with csv
To compute the population counts in each susceptibility class, we use the tool Processing > Raster Analysis > Raster layer zonal statistics. Set the input layer to the clipped population raster dataset and the zones layer to the resampled susceptibility raster map. Finally, plot a pie chart showing the percentage of the population in each susceptibility class.
Population Map
Reclassified Landslide Susceptibility Map
Resampled into the same pixel resolution with Population
Alpine Pastures
- Considering the characteristic of fewer buildings in the study area and the potential hazards posed by landslides, we chose the category of Alpine Aastures for the exposure assessment.(Alpine Pasture data source:Geoportale della Lombardia).
- There are four alpine pastures in the study area: Alpe Meden, Alpe Rhon con Campondola e Campo, Alpe Piano-Ortiche con Aiada, and Alpe Piano dei Cavalli con Malgina e Combolo. We rasterized these areas using the DTM of the study area as the content. Then, using a method similar to processing population data, we performed calculations using the reclassified landslide susceptibility map. Finally, we obtained the exposure assessment for the alpine pasture characteristic.
Alpine Pastures in our Area
Step 5 - WebGIS
- In the WebGIS section, we use OSM basemap and Stadia Maps layers as basemaps. We published the clipped original data, landslide susceptibility map, population map, and exposure assessment map on the Polimi Geoserver. The maps are displayed using WMS services, and a pop-up feature has been added for the currently top-displayed raster data. Clicking on a point will show the corresponding grayscale value。