Here an overview of the steps involved in training the policy is provided.
For the data generation, please follow the instruction given in here.
Cost-Map building is an essential step in guiding optimization and representing the environment. Cost-Maps can be built from either depth and semantic images (i.e., data generated in simulation) or (semantically annotated) point clouds (i.e., real-world data).
If depth and semantic images of the simulation are available, then first 3D reconstruction has to be performed, following the steps described in Point 1. If the (semantically annotated) pointclouds are generated, then the cost-map can be build directly from the pointcloud, following the steps described in Point 2.
-
Simulation: Depth Reconstruction
The reconstruction is executed in two steps, controlled by the config parameter defined in ReconstructionCfg Class:
- Generate colored point cloud by warping each semantic images onto the depth image (account for cameras in different frames)
- Projection into 3D space and voxelization
The process expects following datastructure:
env_name ├── camera_extrinsic.txt # format: x y z qx qy qz qw ├── intrinsics.txt # expects ROS CameraInfo format --> P-Matrix ├── depth # either png and/ or npy, if both npy is used | ├── xxxx.png # images saved with 4 digits, e.g. 0000.png | ├── xxxx.npy # arrays saved with 4 digits, e.g. 0000.npy ├── semantics # optional ├── xxxx.png # images saved with 4 digits, e.g. 0000.png
In the case that the semantic and depth images have an offset in their position (as typical on some robotic platforms), define a
sem_suffic
anddepth_suffix
inReconstructionCfg
to differentiate between the two with the following structure:env_name ├── camera_extrinsic{depth_suffix}.txt # format: x y z qx qy qz qw ├── camera_extrinsic{sem_suffix}.txt # format: x y z qx qy qz qw ├── intrinsics.txt # P-Matrix for intrinsics of depth and semantic images (depth first) ├── depth # either png and/ or npy, if both npy is used | ├── xxxx{depth_suffix}.png # images saved with 4 digits, e.g. 0000.png | ├── xxxx{depth_suffix}.npy # arrays saved with 4 digits, e.g. 0000.npy ├── semantics # optional ├── xxxx{sem_suffix}.png # images saved with 4 digits, e.g. 0000.png
-
Real-World: Open3D-Slam
To create an annotated 3D Point-Cloud from real-world data, i.e., LiDAR scans and semantics generated from the RGB camera stream, use tools such as Open3D Slam.
-
Cost-Building
Either a geometric or semantic cost map can be generated running the following command:
python viplanner/cost_builder.py
With configs set in CostMapConfig. We provided some standard values, however, before running the script, please adjust the config to your needs and local environment paths.
Cost-Maps will be saved within the environment folder, with the following structure:
maps ├── cloud │ ├── cost_{map_name}.txt # 3d visualization of cost map ├── data │ ├── cost_{map_name}_map.txt # cost map │ ├── cost_{map_name}_ground.txt # ground height estimated from pointcloud └── params ├── config_cost_{map_name}.yaml # CostMapConfig used to generate cost map
Configurations of the training given in TrainCfg. Training can be started using the example training script train.py.
python viplanner/train.py
For the training a directory structure as follows is expected/ will be created:
file_path # TrainCfg.file_path or env variable EXPERIMENT_DIRECTORY
├── data
│ ├── env_name # structure as defined in Cost-Map Building
├── models
│ ├── model_name
│ | ├── model.pth # trained model
│ | ├── model.yaml # TrainCfg used to train model
├── logs
│ ├── model_name
It is important that the model name is unique, otherwise the previous training will be overwritten.
Also always copy the model.pt
and model.yaml
because the configs are necessary to reload the model.