3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions


Matching local geometric features on real-world depth images is a challenging task due to the noisy, low-resolution, and incomplete nature of 3D scan data. These difficulties limit the performance of current state-of-art methods, which are typically based on histograms over geometric properties. In this paper, we present 3DMatch, a data-driven model that learns a local volumetric patch descriptor for establishing correspondences between partial 3D data. To amass training data for our model, we propose an unsupervised feature learning method that leverages the millions of correspondence labels found in existing RGB-D reconstructions. Experiments show that our descriptor is not only able to match local geometry in new scenes for reconstruction, but also generalize to different tasks and spatial scales (e.g. instance-level object model alignment for the Amazon Picking Challenge, and mesh surface correspondence). Results show that 3DMatch consistently outperforms other state-of-the-art approaches by a significant margin.

Overview

From existing RGB-D reconstructions (a), we extract local 3D patches and correspondence labels from scans of different views (b). We collect pairs of matching and non-matching local 3D patches converted into a volumetric representation (c) to train a 3D ConvNet-based descriptor (d). This geometric descriptor can be used to establish correspondences for matching 3D geometry in various applications (e) such as reconstruction, model alignment, and surface correspondence.

Paper

Latest version (5 Dec 2016): arXiv:1603.08182 [cs.CV] or here
Older version (27 Mar 2016): 3DMatch: Learning the Matching of Local 3D Geometry in Range Scans

To appear at IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017 Oral Presentation


Bibtex

@inproceedings{zeng20163dmatch,
    title={3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions},
    author={Zeng, Andy and Song, Shuran and Nie{\ss}ner, Matthias and Fisher, Matthew and Xiao, Jianxiong and
            Funkhouser, Thomas},
    booktitle={CVPR},
    year={2017}
}

Video



Code

All 3DMatch code (for training and testing) can be found in our Github repository here.




Keypoint Matching Benchmark

This benchmark evaluates how well descriptors (both 2D and 3D) can establish correspondences between RGB-D frames of different views. The dataset contains 2D RGB-D patches and 3D patches (local TDF voxel grid volumes) of wide-baselined correspondences, which are sampled from our testing split of the RGB-D reconstruction datasets. The pixel size of each 2D patch is determined by the projection of the 0.3m3 local 3D patch around the interest point onto the image plane. We provide Matlab code for generating similar correspondence datasets here. Although our baselines are 3D approaches that use depth information only, we are also looking for descriptor algorithms (2D or 3D) that leverage color information or both modalities.

Benchmark Leaderboard

Method Error 2D-Based 3D-Based Uses Color Uses Depth
3DMatch [1] 35.3% no yes no yes
FPFH [3] 61.3% no yes no yes
Spin-Images [2] 83.7% no yes no yes

To add your results to the leaderboard, please email us your algorithm's .log file for the test set to andyz[at]princeton[dot]edu

[1] A. Zeng, S. Song, M. Nießner, M. Fisher, J. Xiao, T. Funkhouser. 3DMatch: Learning Local Geometric Descriptors from 3D Reconstructions.
[2] A.E. Johnson, M. Hebert. Using spin images for efficient object recognition in cluttered 3D scenes. PAMI 1999.
[3] R.B. Rusu, N. Blodow, M. Beetz. Fast point feature histograms (FPFH) for 3D registration. ICRA 2009.

Download and Description

There are two Matlab .mat files, one for the validation set and one for the test set. Download links:

The validation set contains 10,000 pairs of RGB-D patches and their ground truth correspondence labels (binary 1 for match and 0 for non-match). The test dataset contains similar data for another 10,000 pairs, except the ground truth correspondence labels have been left out. Each Matlab .mat file contains the variables:

data - a 10,000x2 cell array of structs. Each struct contains the 2D/3D patch data of an interest point, with the variables:
• framePath - path to the scene, sequence, and RGB-D frame from which the patch data was extracted
• pixelCoords - 1x2 array with the pixel coordinates of the interest point on the RGB-D frame
• camCoords - 3x1 array with the 3D camera coordinates of the interest point
• bboxCornersCam - 3x8 matrix with the 3D camera coordinates of the 0.3m3 bounding box around the interest point
• bboxRangePixels - 2x2 matrix where each row is the pixel corner of the projected bounding box on the image plane
• camK - 3x3 matrix of the camera intrinsics
• colorPatch - HxWx3 uint8 matrix of the RGB patch around the interest point
• depthPatch - HxW matrix of the depth patch (in meters) around the interest point
• voxelGridTDF - 30x30x30 matrix of TDF voxel grid values (voxel size is 0.01m) around the interest point
labels - (in validation-set.mat only) a 10,000x1 cell array of binary correspondence labels (1 for match, 0 for non-match) for each pair (row) of interest points saved in data.

Update (as of Mar 2018): for convenience, feel free to download the labels for the test set here.

We do not provide a fixed training set, however we do provide the C++/CUDA code here that we used to sample training correspondences on-the-fly to train 3DMatch. You can also generate your own fixed training set by following the instructions here (see makeCorresDataset.m) with the training scenes instead of the testing scenes.

Evaluation

To evaluate on this benchmark, your descriptor algorithm should output a .log file where each row is the descriptor distance (or for some algorithms, the confidence of non-correspondence) between each pair of patches. Our Github toolbox contains an example .log file (for 3DMatch), as well as an example evaluation script for the validation set. Error is computed as false positive rate at 95% recall.




Geometric Registration Benchmark

Similar in spirit to the registration benchmark from Robust Reconstruction of Indoor Scenes, this benchmark evaluates the performance of geometric registration algorithms in the context of scene reconstruction. However, in contrast to prior work, this benchmark uses real-world RGB-D scanning data instead of synthetic data, in order to promote registration algorithms that are robust to depth data from modern commodity range sensors (e.g. Microsoft Kinect, Intel RealSense).

Benchmark Leaderboard

Method Recall Precision
3DMatch [1] 66.8% 40.1%
Spin-Images [2] 51.8% 31.6%
FPFH [3] 44.2% 30.7%

To add your results to the leaderboard, please email us at andyz[at]princeton[dot]edu

[1] A. Zeng, S. Song, M. Nießner, M. Fisher, J. Xiao, T. Funkhouser. 3DMatch: Learning Local Geometric Descriptors from 3D Reconstructions.
[2] A.E. Johnson, M. Hebert. Using spin images for efficient object recognition in cluttered 3D scenes. PAMI 1999.
[3] R.B. Rusu, N. Blodow, M. Beetz. Fast point feature histograms (FPFH) for 3D registration. ICRA 2009.

Downloads

This benchmark contains eight sets of scene fragments created from our testing split of the RGB-D reconstruction datasets. These fragments are available for download in the links below. Each fragment is a 3D point cloud of a surface, integrated from 50 depth frames using TSDF volumetric fusion, and saved to a .ply file. We also provide the fusion code to generate these fragments here.

Dataset Scene Fragments Evaluation Files
7-Scenes redkitchen .zip (40 MB) .zip (1 MB)
SUN3D home_at/home_at_scan1_2013_jan_1 .zip (44 MB) .zip (1 MB)
SUN3D home_md/home_md_scan9_2012_sep_30 .zip (36 MB) .zip (1 MB)
SUN3D hotel_uc/scan3 .zip (55 MB) .zip (1 MB)
SUN3D hotel_umd/maryland_hotel1 .zip (51 MB) .zip (1 MB)
SUN3D hotel_umd/maryland_hotel3 .zip (33 MB) .zip (1 MB)
SUN3D mit_76_studyroom/76-1studyroom2 .zip (82 MB) .zip (1 MB)
SUN3D mit_lab_hj/lab_hj_tea_nov_2_2012_scan1_erika .zip (42 MB) .zip (1 MB)

Evaluation

For evaluation, your geometric registration algorithm should determine whether each non-consecutive pair of fragments can be aligned, and if so, output the predicted rigid transformation to a log file. The format of this log file, and more information about the evaluation protocol, are described here. To compute precision and recall from your algorithm's log files, use the evaluation code here (see Matlab script evaluation/geometric-registration/evaluate.m), or refer to this. We particularly seek registration methods that align fragments without requiring an intial alignment.


Geometric Registration on Synthetic Data

From: S. Choi, Q.Y. Zhou, V. Koltun. Robust Reconstruction of Indoor Scenes. CVPR 2015.
In our paper, we also report the results of 3DMatch's performance over the original synthetic benchmark from Robust Reconstruction of Indoor Scenes, where we achieve 65.1% recall and 25.2% precision. To reproduce our results, you can download the evaluation files for that here:

Code to run 3DMatch on both benchmarks can be found here. We also provide several files with intermediate data generated by our geometric registration pipeline for 3DMatch (as well as the other descriptors we compared to), such as TDF voxel grids for all fragments, keypoints, and descriptor vectors. This can be useful if you wish to improve geometric registration results by designing better search algorithms (RANSAC variants). You can download them here:





RGB-D Reconstruction Datasets

We use several existing RGB-D reconstruction datasets to train 3DMatch and generate evaluation benchmarks. For ease-of-use and compatibility with our code (see Github), we've converted these datasets into a unified file structure and format, available for download in the links below. If you find any of these datasets useful, please cite their original paper(s):


Bash script to download all scenes: download.sh
Training and testing scenes split: split.txt

Dataset Scene RGB-D Data & Poses
SUN3D brown_bm_1/brown_bm_1 .zip (3.1 GB)
SUN3D brown_bm_4/brown_bm_4 .zip (1.5 GB)
SUN3D brown_cogsci_1/brown_cogsci_1 .zip (1.3 GB)
SUN3D brown_cs_2/brown_cs2 .zip (2.4 GB)
SUN3D brown_cs_3/brown_cs3 .zip (1.6 GB)
SUN3D harvard_c3/hv_c3_1 .zip (928 MB)
SUN3D harvard_c5/hv_c5_1 .zip (939 MB)
SUN3D harvard_c6/hv_c6_1 .zip (698 MB)
SUN3D harvard_c8/hv_c8_3 .zip (462 MB)
SUN3D harvard_c11/hv_c11_2 .zip (416 MB)
SUN3D home_at/home_at_scan1_2013_jan_1 .zip (7.2 GB)
SUN3D home_bksh/home_bksh_oct_30_2012_scan2_erika .zip (7.6 GB)
SUN3D home_md/home_md_scan9_2012_sep_30 .zip (6.6 GB)
SUN3D hotel_nips2012/nips_4 .zip (3.2 GB)
SUN3D hotel_sf/scan1 .zip (4.9 GB)
SUN3D hotel_uc/scan3 .zip (4.9 GB)
SUN3D hotel_umd/maryland_hotel1 .zip (2.6 GB)
SUN3D hotel_umd/maryland_hotel3 .zip (853 MB)
SUN3D mit_32_d507/d507_2 .zip (2.8 GB)
SUN3D mit_46_ted_lab1/ted_lab_2 .zip (4.7 GB)
SUN3D mit_76_417/76-417b .zip (5.7 GB)
SUN3D mit_76_studyroom/76-1studyroom2 .zip (1.5 GB)
SUN3D mit_dorm_next_sj/dorm_next_sj_oct_30_2012_scan1_erika .zip (1.4 GB)
SUN3D mit_lab_hj/lab_hj_tea_nov_2_2012_scan1_erika .zip (888 MB)
SUN3D mit_w20_athena/sc_athena_oct_29_2012_scan1_erika .zip (4.5 GB)
7-Scenes chess .zip (3.1 GB)
7-Scenes fire .zip (2.3 GB)
7-Scenes heads .zip (956 MB)
7-Scenes office .zip (4.7 GB)
7-Scenes pumpkin .zip (2.9 GB)
7-Scenes redkitchen .zip (6.1 GB)
7-Scenes stairs .zip (1.7 GB)
RGB-D Scenes v2 scene_01 .zip (349 MB)
RGB-D Scenes v2 scene_02 .zip (335 MB)
RGB-D Scenes v2 scene_03 .zip (341 MB)
RGB-D Scenes v2 scene_04 .zip (350 MB)
RGB-D Scenes v2 scene_05 .zip (534 MB)
RGB-D Scenes v2 scene_06 .zip (481 MB)
RGB-D Scenes v2 scene_07 .zip (439 MB)
RGB-D Scenes v2 scene_08 .zip (423 MB)
RGB-D Scenes v2 scene_09 .zip (334 MB)
RGB-D Scenes v2 scene_10 .zip (316 MB)
RGB-D Scenes v2 scene_11 .zip (281 MB)
RGB-D Scenes v2 scene_12 .zip (324 MB)
RGB-D Scenes v2 scene_13 .zip (188 MB)
RGB-D Scenes v2 scene_14 .zip (239 MB)
BundleFusion apt0 .zip (3.6 GB)
BundleFusion apt1 .zip (3.7 GB)
BundleFusion apt2 .zip (1.8 GB)
BundleFusion copyroom .zip (1.5 GB)
BundleFusion office0 .zip (2.6 GB)
BundleFusion office1 .zip (3.0 GB)
BundleFusion office2 .zip (1.8 GB)
BundleFusion office3 .zip (1.4 GB)
Analysis by Synthesis apt1-kitchen (depth) .zip (88 MB)
Analysis by Synthesis apt1-living (depth) .zip (116 MB)
Analysis by Synthesis apt2-bed (depth) .zip (75 MB)
Analysis by Synthesis apt2-kitchen (depth) .zip (72 MB)
Analysis by Synthesis apt2-living (depth) .zip (78 MB)
Analysis by Synthesis apt2-luke (depth) .zip (154 MB)
Analysis by Synthesis office2-5a (depth) .zip (110 MB)
Analysis by Synthesis office2-5b (depth) .zip (127 MB)

Note: SUN3D scenes were reconstructed with Halber et al. Please also cite their paper if you use the SUN3D scenes.

Dataset Format

Each scene is a folder containing one or more RGB-D video sequences. The folder contents are as follows:

camera-intrinsics.txt - a text file with depth camera intrinsics (3x3 matrix in homogeneous coordinates)
seq-XX
• frame-XXXXXX.color.png - a 24-bit PNG RGB color image.
• frame-XXXXXX.depth.png - a 16-bit PNG depth image, aligned to its corresponding color image. Depth is saved in millimeters (mm). Invalid depth is set to 0.
• frame-XXXXXX.pose.txt - a text file with the camera pose of the frame (camera-to-world, 4x4 matrix in homogeneous coordinates and in meters)

License Agreements

7-Scenes: The data is provided for non-commercial use only. License Agreement
BundleFusion: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License
Analysis by Synthesis: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License



Page last updated: 10-Mar-2017
Posted by: Andy Zeng