Robust Single-shot Structured Light 3D Imaging via Neural Feature Decoding

SIGGRAPH Asia 2025

1Wangxuan Institute of Computer Technology, Peking University 2School of Intelligence Science and Technology, Peking University 3Yuanpei College, Peking University 4College of Future Technology, Peking University 5University of North Carolina at Chapel Hill *equal contributions corresponding author
Teaser Image

NSL, Neural single-shot Structured Light, achieves robust and high-fidelity 3D reconstruction from a single-shot structured light input.

Core Insights

Pixel Matching (Baseline)
Feature Matching (Ours)

Drag slider to compare: Left = Pixel Matching, Right = Ours

Matching in Feature Domain

We argue that single-shot structured light decoding should be performed in the Neural Feature Domain rather than the fragile Pixel Domain.

Traditional intensity-based matching fails easily with noise and surfaces not covered by projected pattern. Our method constructs cost volumes from deep features, producing significantly denser and more robust depth maps.


100% Synthetic Training Data

Real-world data is not required. Training on high-fidelity synthetic data is sufficient for strong Sim2Real generalization.

We introduce a large-scale dataset containing 937K pairs of structured light data:
2860 Scenes
103 Types of Materials
Diverse patterns

Get Dataset

Prior Guided Refinement

Leveraging Image Priors

Patterns inevitably fail in challenging areas (e.g., occlusion, specularities). We aggressively leverage Image Priors to repair these defects.

Using the initial depth from neural matching as a prompt, we fine-tune Depth Anything V2 to recover fine details and complete geometry that structured light alone cannot capture.

Comparison

Scene
Input Structured Light Image (Left)
Ours (NSL)
Traditional Pixel-Matching Method
Case 1 RGB

RGB

Case 1 IR

IR

Ours

Traditional

Case 2 RGB
Case 2 IR
Case 3 RGB
Case 3 IR
Case 4 RGB
Case 4 IR

Abstract

We consider the problem of active 3D imaging using single-shot structured light systems. Traditional structured light methods typically decode depth correspondences through pixel-domain matching algorithms, resulting in limited robustness under challenging scenarios like occlusions, fine-structured details, and non-Lambertian surfaces.

Inspired by recent advances in neural feature matching, we propose a learning-based structured light decoding framework that performs robust correspondence matching within feature space rather than the fragile pixel domain. Our method extracts neural features from the projected patterns and captured infrared (IR) images, explicitly incorporating their geometric priors by building cost volumes in feature space.

To further enhance depth quality, we introduce a depth refinement module that leverages strong priors from large-scale monocular depth estimation models. Experiments demonstrate that our method, trained exclusively on synthetic data, generalizes well to real-world indoor environments and outperforms commercial systems.

Method Pipeline

NSL Pipeline

The pipeline of NSL. Given a single or stereo pair of IR images and a projected pattern, NSL first estimates an initial raw depth map via the Neural Feature Matching module (left path). Next, the Monocular Depth Refinement module incorporates priors from a monocular depth estimation model, using the initial depth as a prompt (right path), generating a final depth map with enhanced structural detail.

BibTeX

@inproceedings{li2025nsl,
  title     = {Robust Single-shot Structured Light 3D Imaging via Neural Feature Decoding},
  author    = {Jiaheng Li, Qiyu Dai, Lihan Li, Praneeth Chakravarthula, He Sun, Baoquan Chen and Wenzheng Chen},
  booktitle = {SIGGRAPH Asia 2025 Conference Papers},
  year      = {2025},
  publisher = {ACM}
}