Back | Next | Contents
Object Detection
In this step of the tutorial, we'll walk through the creation of the previous example for realtime object detection on a live camera feed in only 10 lines of Python code. The program will load the detection network with the detectNet
object, capture video frames and process them, and then render the detected objects to the display.
For your convenience and reference, the completed source is available in the python/examples/my-detection.py
file of the repo, but the guide below will act like they reside in the user's home directory or in an arbitrary directory of your choosing.
First, open up your text editor of choice and create a new file. Below we'll assume that you'll save it to your user's home directory as ~/my-detection.py
, but you can name and store it where you wish.
At the top of the source file, we'll import the Python modules that we're going to use in the script. Add import
statements to load the jetson.inference
and jetson.utils
modules used for object detection and camera capture.
import jetson.inference
import jetson.utils
note: these Jetson modules are installed during the
sudo make install
step of building the repo.
if you did not runsudo make install
, then these packages won't be found when the example is run.
Next use the following line to create a detectNet
object instance that loads the 91-class SSD-Mobilenet-v2 model:
# load the object detection model
net = jetson.inference.detectNet("ssd-mobilenet-v2", threshold=0.5)
Note that you can change the model string to one of the values from this table to load a different detection model. We also set the detection threshold here to the default of 0.5
for illustrative purposes - you can tweak it later if needed.
To connect to the camera device for streaming, we'll create an instance of the gstCamera
object:
camera = jetson.utils.gstCamera(1280, 720, "/dev/video0") # using V4L2
It's constructor accepts 3 parameters - the desired width, height, and video device to use. Substitute the following snippet depending on if you are using a MIPI CSI camera or a V4L2 USB camera, along with the preferred resolution:
- MIPI CSI cameras are used by specifying the sensor index (
"0"
or"1"
, ect.)camera = jetson.utils.gstCamera(1280, 720, "0")
- V4L2 USB cameras are used by specifying their
/dev/video
node ("/dev/video0"
,"/dev/video1"
, ect.)camera = jetson.utils.gstCamera(1280, 720, "/dev/video0")
- The width and height should be a resolution that the camera supports.
- Query the available resolutions with the following commands:
$ sudo apt-get install v4l-utils $ v4l2-ctl --list-formats-ext
- If needed, change
1280
and720
above to the desired width/height
- Query the available resolutions with the following commands:
note: for compatible cameras to use, see these sections of the Jetson Wiki:
- Nano:https://eLinux.org/Jetson_Nano#Cameras
- Xavier:https://eLinux.org/Jetson_AGX_Xavier#Ecosystem_Products_.26_Cameras
- TX1/TX2: developer kits include an onboard MIPI CSI sensor module (0V5693)
Next, we'll create an OpenGL display with the glDisplay
object and create a main loop that will run until the user exits:
display = jetson.utils.glDisplay()
while display.IsOpen():
# main loop will go here
Note that the remainder of the code below should be indented underneath this while
loop.
The first thing that happens in the main loop is to capture the next video frame from the camera. camera.CaptureRGBA()
will wait until the next frame has been sent from the camera, and after it's been acquired by the Jetson, it will convert it to RGBA floating-point format residing in GPU memory.
img, width, height = camera.CaptureRGBA()
Returned are a tuple containing a reference to the image data on the GPU, along with it's dimensions.
Next the detection network processes the image with the net.Detect()
function. It takes in the image, width, and height from camera.CaptureRGBA()
and returns a list of detections:
detections = net.Detect(img, width, height)
This function will also automatically overlay the detection results on top of the input image.
If you want, you can add a print(detections)
statement here, and the coordinates, confidence, and class info will be printed out to the terminal for each detection result. Also see the detectNet
documentation for info about the different members of the Detection
structures that are returned for accessing them directly in a custom application.
Finally we'll visualize the results with OpenGL and update the title of the window to display the current peformance:
display.RenderOnce(img, width, height)
display.SetTitle("Object Detection | Network {:.0f} FPS".format(net.GetNetworkFPS()))
The RenderOnce()
function will automatically flip the backbuffer and is used when we only have one image to render.
That's it! For completness, here's the full source of the Python script that we just created:
import jetson.inference
import jetson.utils
net = jetson.inference.detectNet("ssd-mobilenet-v2", threshold=0.5)
camera = jetson.utils.gstCamera(1280, 720, "/dev/video0") # using V4L2
display = jetson.utils.glDisplay()
while display.IsOpen():
img, width, height = camera.CaptureRGBA()
detections = net.Detect(img, width, height)
display.RenderOnce(img, width, height)
display.SetTitle("Object Detection | Network {:.0f} FPS".format(net.GetNetworkFPS()))
Note that this version assumes you are using a V4L2 USB camera. See the Opening the Camera Stream
section above for info about changing it to use a MIPI CSI camera or supporting different resolutions.
To run the application we just coded, simply launch it from a terminal with the Python interpreter:
$ python my-detection.py
To tweak the results, you can try changing the model that's loaded along with the detection threshold. Have fun!
Next | Semantic Segmentation with SegNet
Back | Running the Live Camera Detection Demo
© 2016-2019 NVIDIA | Table of Contents