3D camera is a kind of camera that can capture objects or scenes in three dimensions. Different from images recorded by traditional cameras, the images captured by 3D cameras contain depth information of the scene, to maximally reconstruct a real-world three-dimensional viewing experience. Depth perception enables the 3D camera’s application in a wide variety of scenarios, such as AR/VR, face recognition and autonomous driving.
At present, the three prevalent 3D imaging technologies are binocular stereo vision, ToF and structured light. With regard to active light source projection, the techniques fall into passive and active categories. Binocular stereo vision is a passive technique and the other two are active techniques.
Binocular stereo vision
This technique mimics the binocular vision of humans. Humans have two eyes, and they are 60mm to 70mm apart, which results in slight image location disparity when each eye views the same object. The brain will process the disparity to perceive depth. Stereo cameras usually have two lenses, apart with a distance of about 60mm, enabling each of them to capture slightly different images, which are computed by the processor to generate a depth map. It is a passive technique because no external light is required other than the ambient light, which means it is suitable for outdoor use in relatively good light conditions. However, in low light conditions or if the object/scene has few textures, it would be hard for the cameras to extract stereo features. The stereo matching method of this technique requires the great processing power of the sensor to guarantee resolution and instantaneous output. Constrained by baseline, namely the distance between the two lenses, stereo cameras work in short range, often within 2 meters. Famous stereo cameras include the ZED 2K Stereo Camera of Sterolabs and Point Grey’s BumbleBee.
Time of Flight (ToF)
ToF is a technique of calculating the distance between the camera and the object, by measuring the time it takes the projected infrared light to travel from the camera, bounced off the object’s surface, and return to the sensor. As light speed is constant, by analyzing the phase shift of the emitted and returned light, the processor can calculate the distance of the object and reconstruct it. Unlike stereo vision technology, ToF is an active technique, because it actively projects light to measure distance, instead of relying on ambient light. It works well in dim light conditions. ToF cameras have strengths such as fast processing speed, comparatively longer range and compact design. ToF cameras are widely used in VR/AR, robotic navigation and auto piloting. Well-known products are Kinect2 produced by Microsoft, and the rear camera in the recently released Samsung Galaxy Note 10.
Structured light
Structured light is another active 3D imaging technique as it employs structured light without depending on external light sources. The cameras project modulated patterns to the surface of a scene and calculate the disparity between the original projected pattern and the observed pattern deformed by the surface of the scene. As an active technique, a structured light camera also works well in conditions lacking light or texture. But the well-modulated light pattern method generates higher accuracy in short range, compared with Time of Flight method. And the depth resolution can reach submillimeter level. Structured light cameras are mostly used indoors because of sunlight interference with projected light patterns. Intel adopts structured light technology in its RealSense depth camera series. The U.S. startup company Revopoint adopts MEMS structured light with dual lenses in its Acusense 3D depth camera, which offers RGB resolution up to 8 megapixels and accuracy up to 0.1mm. It captures a depth range from 0.2 meters to 2 meters. Structured light cameras are suitable for cases requiring high accuracies in short range, such as face recognition, gesture recognition and industrial inspection.
Pros and Cons
To better understand the differences among 3D cameras, listing their pros and cons would be a good way. Based on the key elements involved in capturing high-quality depth data, a comparison of depth range, accuracy, and applications of each technique are elaborated here.
Technology | Binocular stereo vision | ToF | Structured light |
Working distance | ≤2m | 0.4-5m | 0.2m-3m |
Accuracy | 5%-10% of distance | ≤0.5% of distance | ≤1mm |
Resolution | medium | low | high |
Power consumption | high | medium | medium |
Use environment | Environment where features can be detected | Indoor and outdoor | Indoor |
Frame rate | high | variable | ≈30fps |
Hardware cost | low | medium | high |
Software processing requirement | high | low | medium |
Applications | Ranging, 3D reconstruction | VR, AR, autonomous vehicle | Face recognition |