3D scanning and camera tracking using a depth camera

DISCONTINUATION OF PROJECT.

This project will no longer be maintained by Intel.

Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.

Intel no longer accepts patches to this project.

If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.

3D scanning and camera tracking using a depth camera

This project is a demonstration on how to use an Intel® RealSense™ camera and create a full 3D model of an object by moving the depth camera around it. It is able to guess the movement of the camera without any additional motion sensor, just from the depth data. It then combines the data into a single model.

How it works

Parts of the code originate from the Pointcloud demo. An explanation on how to use the depth camera is in the articles Depth Camera Capture in HTML5 and How to create a 3D view in WebGL.

The project consists of three main parts: the motion estimation, the model creation, and the rendering. Almost everything is performed on the GPU by using WebGL shaders.

The motion estimation algorithm

This stage of the demo uses the ICP algorithm to guess the movement of the camera without any motion sensor (also known as SLAM). It has been inspired by the paper KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera, which implements a similar version of the algorithm optimized for GPUs. Thanks to this design, it's able to process the frames in real-time even on a relatively weak GPU.

The images below show how two frames of data (that were artificially created) get aligned over the course of 10 steps of the ICP algorithm.

The principle is similar to linear regression. In linear regression, you are trying to fit a line trough a noisy set of points, while minimizing the error. With the ICP algorithm, we are trying to find a motion that will match two pointclouds together as best as possible, assuming 6DOF (six degrees of freedom). If we had exact information on which point from one pointcloud corresponds to which point in the other pointcloud, this would be relatively easy. To some degree, this could be achieved by recognizing features of a scene (e.g. corners of a table) and deciding that they match up, but this approach is computationally intensive and difficult to implement. A simpler approach is to decide that whatever point is closest, that's the corresponding point. The closest point could be found by a brute force search or by using a k-d tree, but this project uses a heuristic that is very well suited for the GPU and is described in the shaders/points-fshader.js file. It's not as exact as using the k-d tree, but has linear time complexity for each point and is very well suited for the GPU.

This is the most complex part of the project, consisting of three different shaders that are run several times per frame of data. The documentation is in the shaders and in movement.js. A much simpler implementation is in the file movement_cpu.js, which is used for testing.

Since WebGL 2.0 doesn't have compute shaders, the calculations are done in fragment shaders that take a texture with floating point data as input. Then they render the output data into another texture with floating point data.

Model creation

If memory and bandwidth were free, we could just store all the poinclouds and render them together. However, this would not only be very inefficient (we would have millions of points after just a few minutes of recording), it would also end up looking very noisy. A better solution is to create a volumetric model. You can imagine it as a 3D grid where we simply set a voxel (volumetric pixel) to 1 if a point lies within it. This would still be very inefficient and noisy, with the addition of looking too much like Minecraft. An even better way is to create a volumetric model using a signed distance function. Instead of storing 1 or 0 in a voxel, we store the distance to the object surface from the center of the voxel. This method is described in the paper A Volumetric Method for Building Complex Models from Range Images.

The demo uses a 3D texture to store the volumetric model. The details of the model creation are described in the file shaders/model-fshader.js.

Rendering

This stage is the simplest and is more closely described in the file shaders/renderer-fshader.js. It uses the raymarching algorithm (a simpler and faster version of raytracing) to render the volumetric, model, on which it then applies Phong lighting.

Setup

The project works on Windows, Linux and ChromeOS with Intel® RealSense™ SR300 (and related cameras like Razer Stargazer or Creative BlasterX Senz3D) and R200 3D Cameras.

To make sure your system supports the camera, follow the installation guide in librealsense.
Connect the camera.
Go to the demo page.

To run the code locally, give Chromium the parameter --use-fake-ui-for-media-stream, so that it doesn't ask you for camera permissions, which are remembered only for https pages.

Intel and Intel RealSense are trademarks of Intel Corporation in the U.S. and/or other countries.

Name		Name	Last commit message	Last commit date
Latest commit History 194 Commits
ar_markers		ar_markers
images		images
shaders		shaders
third_party		third_party
.eslintrc.js		.eslintrc.js
LICENSE		LICENSE
README.md		README.md
depth-camera.js		depth-camera.js
gl.js		gl.js
index.html		index.html
movement.js		movement.js
movement_cpu.js		movement_cpu.js
numeric-licence.txt		numeric-licence.txt
script.js		script.js
test.html		test.html
test.js		test.js
testdata.js		testdata.js
texture.js		texture.js
util.js		util.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

3D scanning and camera tracking using a depth camera