The French Lidar HD project ambitions to map France in 3D using 10 pulse/m² aerial Lidar. The data will be openly available, including a semantic segmentation with a minimal number of classes: ground, vegetation, buildings, vehicles, bridges, others.
Here we train multiclass segmentation models that can serve as base model for further segmentation tasks on French Lidar HD data.
The goal is to be somewhat data-agnostic yet opiniated, with default configuration for different national Lidar data specifications.
This repository provides scripts tackles the following tasks:
train.py
: Training of the semantic segmentation neural network on aerial Lidar point clouds.predict.py
: Applying model on unseen data.
Code is packaged for easy deployment (see below). Trained models are not public-hosted at the moment.
This Lidar Segmentation repository is heavily based on the following template. Please refer to its README for documentation on its general logic.
# clone project
git clone https://github.com/IGNF/lidar-deep-segmentation
cd lidar-deep-segmentation
# [OPTIONAL] If you want to use a gpu make sure cuda toolkit is installed
sudo apt install nvidia-cuda-toolkit
# install anaconda
# see https://www.anaconda.com/products/individual
# create conda environment - modify script to adapt versions and cudatoolkit to your own infrastructure.
source bash/setup_environment/setup_env.sh # with cudatoolkit
source bash/setup_environment/setup_env_cpu_only.sh # cpu version only
# activate using
conda activate lidar_deep_segmentation
If you are interested in running inference from anywhere, you can install code as a package in a your virtual environment.
# activate an env matching ./bash/setup_env.sh requirements.
conda activate lidar_deep_segmentation
# install the package
pip install --upgrade https://github.com/IGNF/lidar-deep-segmentation/tarball/main # from github directly
pip install -e . # from local sources
To run inference, you will need:
- A source cloud point in LAS format on which to infer new classes. and probabilites.
- A checkpoint of a trained lightning module implementing model logic (class
lidar_multiclass.models.model.Model
) - A minimal yaml configuration specifying parameters. We use hydra to manage configurations, and this yaml results from the model training. The
datamodule
andmodel
parameters groups must match datset characteristics and model training settings. Thepredict
parameters group specifies path to models and data as well as batch size (N=50 works well, the larger the faster) and use of gpu (optionnal).
Fill out the {missing parameters} and run:
python -m lidar_multiclass.predict --config-path {/path/to/.hydra} --config-name {config.yaml} predict.src_las={/path/to/cloud.las} predict.output_dir={/path/to/out/dir/} predict.resume_from_checkpoint={/path/to/checkpoint.ckpt} predict.gpus={0 for none, [i] to use GPU number i} datamodule.batch_size={N} hydra.run.dir={path/for/hydra/logs}
To show you current inference config, simply add a --help
flag
python -m lidar_multiclass.predict --config-path {/path/to/.hydra} --config-name {config.yaml} --help
TODO: add a control to where hydra log files are saved.
Some environment variable are injected at runtime and need to be specified in a .env
file. Rename .env_example
to .env
and fill out:
LOG PATH
, where hydra logs and config are saved.DATAMODULE
section, which specify where to look for training data.LOGGER
section, which specify credentials needed for logging to comet.ml. Alternatively, logging can be disabled by settinglogger=null
ar runtime.
For training, input point clouds need to be splitted in chunks that can be digested by segmentation models. We found 50m*50m to be a good balance between the model's receptive field and capacity. A specific preparation is needed that is described in section Data preparation
The expected file structure is summarized in .env
.
A more detailed documentation on how to create a compatible, training-ready dataset from Swiss data is given in this repo.
Once you have data, define your experiment setting in an experiment file in the configs/experiment
folder.
To try out your setting by overfitting on a single batch of a Swiss dataset, run
python run.py experiment=RandLaNetDebug.yaml
After training, you model best checkpoints and hydra config will be saved in a DATE/TIME/
subfolder of the LOG_PATH
you specified, with an associated hydra config.yaml
.
From the line for package-based inference above, simply change python -m lidar_multiclass.predict
to python run.py
to run directly from sources.
In case you want to swicth to package-based inference, you will need to comment out the parameters that depends on local environment variables such as logger credentials and training data directory. You can do so by making a copy of the config.yaml
file and commenting out the lines containing oc.env
logic.
In lidar_multiclass/datamodule/data.py
is the logic for data pre-processing, both offline and online, i.e. saving preprocessed data objects for fast trainig vs. pre-processing at inference time.
The loading function is dataset dependant, and there are currently a logic for both SwissTopo data (withour infrared channel) and French IGN data (with infrared channel).
For help, run
python lidar_multiclass/datamodules/data.py -h
Currently, two sources are supported:
- French Lidar HD, produced by the French geographical Institute. The data is colorized with both RGB and Infrared. Therefore, data processing will include Infrared channel as well as NDVI.
- Swiss Lidar from SwissSurface3D (en), a similar initiative from the Swiss geographical institute SwissTopo. The data comes from the SwissSurface3D Lidar database and is not colorized, so we have to join it with SwissImage10 orthoimages database. The procedure is described in this standalone repository.