DISCONTINUATION OF PROJECT.
This project will no longer be maintained by Intel.
Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.
Intel no longer accepts patches to this project.
If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.
This project is a demonstration on how to use an Intel® RealSense™ camera and create a full 3D model of an object by moving the depth camera around it. It is able to guess the movement of the camera without any additional motion sensor, just from the depth data. It then combines the data into a single model.
Parts of the code originate from the Pointcloud demo. An explanation on how to use the depth camera is in the articles Depth Camera Capture in HTML5 and How to create a 3D view in WebGL.
The project consists of three main parts: the motion estimation, the model creation, and the rendering. Almost everything is performed on the GPU by using WebGL shaders.
This stage of the demo uses the ICP algorithm to guess the movement of the camera without any motion sensor (also known as SLAM). It has been inspired by the paper KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera, which implements a similar version of the algorithm optimized for GPUs. Thanks to this design, it's able to process the frames in real-time even on a relatively weak GPU.
The images below show how two frames of data (that were artificially created) get aligned over the course of 10 steps of the ICP algorithm.
The principle is similar to
linear regression. In linear
regression, you are trying to fit a line trough a noisy set of points,
while minimizing the error. With the ICP algorithm, we are trying to find
a motion that will match two pointclouds together as best as possible,
assuming 6DOF (six degrees of freedom). If we had exact information on which
point from one pointcloud corresponds to which point in the other pointcloud,
this would be relatively easy. To some degree, this could be achieved by
recognizing features of a scene (e.g. corners of a table) and deciding that they
match up, but this approach is computationally intensive and difficult to
implement. A simpler approach is to decide that whatever point is closest,
that's the corresponding point. The closest point could be found by a brute
force search or by using a k-d tree, but this project uses a heuristic that is
very well suited for the GPU and is described in the shaders/points-fshader.js
file. It's not as exact as using the k-d tree, but has linear time complexity
for each point and is very well suited for the GPU.
This is the most complex part of the project, consisting of three different
shaders that are run several times per frame of data. The documentation is
in the shaders and in movement.js
.
A much simpler implementation is in the file movement_cpu.js
, which is used for
testing.
Since WebGL 2.0 doesn't have compute shaders, the calculations are done in fragment shaders that take a texture with floating point data as input. Then they render the output data into another texture with floating point data.
If memory and bandwidth were free, we could just store all the poinclouds and render them together. However, this would not only be very inefficient (we would have millions of points after just a few minutes of recording), it would also end up looking very noisy. A better solution is to create a volumetric model. You can imagine it as a 3D grid where we simply set a voxel (volumetric pixel) to 1 if a point lies within it. This would still be very inefficient and noisy, with the addition of looking too much like Minecraft. An even better way is to create a volumetric model using a signed distance function. Instead of storing 1 or 0 in a voxel, we store the distance to the object surface from the center of the voxel. This method is described in the paper A Volumetric Method for Building Complex Models from Range Images.
The demo uses a 3D texture to store the volumetric model. The details of the
model creation are described in the file shaders/model-fshader.js
.
This stage is the simplest and is more closely described in the file
shaders/renderer-fshader.js
. It uses the
raymarching
algorithm (a simpler and faster version of raytracing) to render the volumetric,
model, on which it then applies Phong lighting.
The project works on Windows, Linux and ChromeOS with Intel® RealSense™ SR300 (and related cameras like Razer Stargazer or Creative BlasterX Senz3D) and R200 3D Cameras.
-
To make sure your system supports the camera, follow the installation guide in librealsense.
-
Connect the camera.
-
Go to the demo page.
To run the code locally, give Chromium the parameter
--use-fake-ui-for-media-stream
, so that it doesn't ask you for camera
permissions, which are remembered only for https pages.
Intel and Intel RealSense are trademarks of Intel Corporation in the U.S. and/or other countries.