-
Notifications
You must be signed in to change notification settings - Fork 345
ROS Nodes
This document describes the ROS nodes that are used or implemented by the project.
The camera node (gscam
) publishes to the /camera/image_raw
topic using the standard ROS Image message. This is the master source which provides information to other nodes like the TrailNet DNN, and obstacle avoidance.
The caffe_ros
node provides support for DNN inference using TensorRT library. This node currently supports Caffe models such as classification (TrailNet), YOLO-based object detection, regression and semantic segmentation. The node produces messages in different formats depending on the type of the network and post-processing options (e.g. TrailNet vs YOLO).
The timestamp in the header of the message is the same as the timestamp from the corresponding camera message. This simplifies synchronization of the image frames with DNN output and other nodes for the purpose of debugging and logging.
This node publishes output of the DNN (e.g. softmax layer) using standard Image message.
- for
1D
output like that of the softmax layer, the dimensions of the published image are1x1xC
whereC
is the size of the layer (e.g. 6 for the TrailNetout
layer or 1000 for ImageNet'ssoftmax
layer). - The
encoding
field of the message is in the following format:32FCXXX
whereXXX
is equal to number of channels. For example, for TrailNet:32FC6
, for ImageNet:32FC1000
.32
means 32-bit andF
means float. See the ROS documentation for more details. - The
data
field contains the output of the DNN in byte array representation so it should be cast to a 32-bit float type.
The default topic name is /caffe_ros/network/output
but it can be changed using launch file or command line.
The node requires two parameters:
-
prototxt_path
- path to the Caffe model .prototxt file -
model_path
- path to the Caffe model binary file Additionally, make sure that other parameters, like the input/output layer names (input_layer
/output_layer
) are set correctly. An example of the command line for TrailNet:
rosrun caffe_ros caffe_ros_node __name:=trails_dnn _prototxt_path:=/data/src/redtail/models/pretrained/TrailNet_SResNet-18.prototxt _model_path:=/data/src/redtail/models/pretrained/TrailNet_SResNet-18.caffemodel _output_layer:=out
Note: when using rosrun
remember to prefix parameters with _
(__
for system parameters) and use :=
when assigning a value.
To check that the DNN node is producing some data, make sure that camera node (a real camera via gscam
or an image_pub node) is running, and execute rostopic echo /trails_dnn/network/output
. An example of the output:
header:
seq: 201
stamp:
secs: 1501106944
nsecs: 796543294
frame_id: ''
height: 1
width: 1
encoding: 32FC6
is_bigendian: 0
step: 24
data: [110, 74, 230, 58, 135, 125, 95, 60, 228, 14, 124, 63, 133, 26, 6, 61, 156, 75, 115, 63, 91, 87, 138, 60]
To convert the data
byte array to float array, use this simple Python script:
import struct
data = bytearray([110, 74, 230, 58, 135, 125, 95, 60, 228, 14, 124, 63, 133, 26, 6, 61, 156, 75, 115, 63, 91, 87, 138, 60])
print(struct.unpack('<%df' % (len(data) / 4), data))
In case the node is launched with post_proc:=YOLO
argument, then a special YOLO post-processing is applied. In such case, the node publishes output of the DNN using standard Image message in a certain format: the output is a 2D, single-channel "image" that has the following format: WxHx1
(so encoding == 32FC1
) where W
is fixed and equals 6, and H
is equal to the number of detected objects. For example, if the DNN has detected 2 objects, then the output is 6x2
image. For each detected object, the 6 values are the following:
0 : label (class) of the detected object (e.g. person or a dog).
1 : probability of this object.
2,3: x and y coordinates of the top left corner of the object in image coordinates.
4,5: width and height of the object in image coordinates.
All values are 32-bit floats, including label. Label indices correspond to 20 classes from PASCAL VOC 2012 dataset.
For example, if DNN detected a person (label:14) and a dog (label: 12) in the image with dimensions 320x180 then the output might look something like that:
Label | Prob | X | Y | Width | Height |
---|---|---|---|---|---|
14.0 | 0.5 | 120.0 | 80.0 | 30.0 | 60.0 |
12.0 | 0.4 | 160.0 | 115.0 | 40.0 | 20.0 |
Running inference with INT8
precision improves both performance (speed) and power efficiency on hardware that has native INT8
support. Note that at the moment, INT8
is supported on GP102-GP106 Pascal GPUs. Jetson TX1 and TX2 do not support INT8
instructions so in this case caffe_ros
node will use FP32
mode. For more details on INT8 hardware support and implementation details in TensorRT see here and here.
redtail 2.0 introduces new parameter in caffe_ros
node: data_type
which accepts the values: fp32
, fp16
and int8
. Note that existing parameter, use_fp16
, while deprecated, is still supported.
Using INT8
mode requires additional step to ensure no loss in model quality: model calibration. During model calibration, the network is fed with representative sample from a dataset so relevant statistics can be collected and then used to perform runtime INT8
inference with as little loss in accuracy as possible. The sample calibration batches must be as close to what the network will "see" during runtime. The calibration needs to be performed only once and then can be stored as a file that will be used later during runtime.
caffe_ros
node now supports 2 parameters that enable calibration:
-
int8_calib_src
provides a path (directory or a single image file) to calibration samples. If parameter is directory, all files will be read and used in calibration process. This process may take some time depending on the number of images. Once calibration is performed, this parameter can be removed (or set to empty string) to avoid re-calibration on every run. -
int8_calib_cache
specifies path to calibration cache. If the file exists, it will be loaded and used bycaffe_ros
runtime. If it does not exist, it will be created by calibration process (this requiresint8_calib_src
parameter to be set).
Our TrailNet
and YOLO
models were trained as FP16
-aware models so there is almost no difference in results produced by these networks in FP32
and FP16
modes. However, no special care was taken to make sure the networks also work in INT8
mode so for these network results obtained in INT8
might be different from results in FP32
/FP16
modes.
To verify that the DNN node is working properly, run the basic end-to-end tests that come with caffe_ros
package. In the catkin workspace directory, run the following (replace paths if needed):
catkin_make caffe_ros_tests
export REDTAIL_TEST_DIR=/data/src/redtail/ros/packages/caffe_ros/tests/data
export REDTAIL_MODEL_DIR=/data/src/redtail/models/pretrained
rostest caffe_ros tests_basic.launch test_data_dir:=$REDTAIL_TEST_DIR \
trail_prototxt_path:=$REDTAIL_MODEL_DIR/TrailNet_SResNet-18.prototxt trail_model_path:=$REDTAIL_MODEL_DIR/TrailNet_SResNet-18.caffemodel \
object_prototxt_path:=$REDTAIL_MODEL_DIR/yolo-relu.prototxt object_model_path:=$REDTAIL_MODEL_DIR/yolo-relu.caffemodel
Note: running the tests on the device like TX2 may take about two minutes.
The image_pub
node is a simple ROS node that reads video or image files and publishes frames as a ROS topic as an Image message. There is one mandatory parameter, img_path
, which specifies the path to an image or video file. This node supports setting custom frame rates as well as repeating an image indefinitely.
The default topic is /camera/image_raw
which can be changed using the camera_topic
parameter.
An example of publishing frames from the video file using the file's native frame rate:
rosrun image_pub image_pub_node _img_path:=/data/videos/trail_test.mp4
And example of publishing the same image repeatedly at 30 frames/sec:
rosrun image_pub image_pub_node _img_path:=/data/images/trail_right.png _pub_rate:=30 _repeat:=true