Home	Keypoint Matching Benchmark	Geometric Registration Benchmark	RGB-D Reconstruction Datasets

3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions

Matching local geometric features on real-world depth images is a challenging task due to the noisy, low-resolution, and incomplete nature of 3D scan data. These difficulties limit the performance of current state-of-art methods, which are typically based on histograms over geometric properties. In this paper, we present 3DMatch, a data-driven model that learns a local volumetric patch descriptor for establishing correspondences between partial 3D data. To amass training data for our model, we propose an unsupervised feature learning method that leverages the millions of correspondence labels found in existing RGB-D reconstructions. Experiments show that our descriptor is not only able to match local geometry in new scenes for reconstruction, but also generalize to different tasks and spatial scales (e.g. instance-level object model alignment for the Amazon Picking Challenge, and mesh surface correspondence). Results show that 3DMatch consistently outperforms other state-of-the-art approaches by a significant margin.

Overview

From existing RGB-D reconstructions (a), we extract local 3D patches and correspondence labels from scans of different views (b). We collect pairs of matching and non-matching local 3D patches converted into a volumetric representation (c) to train a 3D ConvNet-based descriptor (d). This geometric descriptor can be used to establish correspondences for matching 3D geometry in various applications (e) such as reconstruction, model alignment, and surface correspondence.

Paper

Latest version (5 Dec 2016): arXiv:1603.08182 [cs.CV] or here
Older version (27 Mar 2016): 3DMatch: Learning the Matching of Local 3D Geometry in Range Scans

Andy Zeng

Shuran Song

Matthias Nießner

Matthew Fisher

Jianxiong Xiao

Thomas Funkhouser

To appear at IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017 Oral Presentation

Bibtex

@inproceedings{zeng20163dmatch,
    title={3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions},
    author={Zeng, Andy and Song, Shuran and Nie{\ss}ner, Matthias and Fisher, Matthew and Xiao, Jianxiong and
            Funkhouser, Thomas},
    booktitle={CVPR},
    year={2017}
}

Video

Code

All 3DMatch code (for training and testing) can be found in our Github repository here.

Keypoint Matching Benchmark

This benchmark evaluates how well descriptors (both 2D and 3D) can establish correspondences between RGB-D frames of different views. The dataset contains 2D RGB-D patches and 3D patches (local TDF voxel grid volumes) of wide-baselined correspondences, which are sampled from our testing split of the RGB-D reconstruction datasets. The pixel size of each 2D patch is determined by the projection of the 0.3m³ local 3D patch around the interest point onto the image plane. We provide Matlab code for generating similar correspondence datasets here. Although our baselines are 3D approaches that use depth information only, we are also looking for descriptor algorithms (2D or 3D) that leverage color information or both modalities.

Benchmark Leaderboard

Method	Error	2D-Based	3D-Based	Uses Color	Uses Depth
3DMatch [1]	35.3%	no	yes	no	yes
FPFH [3]	61.3%	no	yes	no	yes
Spin-Images [2]	83.7%	no	yes	no	yes

To add your results to the leaderboard, please email us your algorithm's .log file for the test set to andyz[at]princeton[dot]edu

[1] A. Zeng, S. Song, M. Nießner, M. Fisher, J. Xiao, T. Funkhouser. 3DMatch: Learning Local Geometric Descriptors from 3D Reconstructions.
[2] A.E. Johnson, M. Hebert. Using spin images for efficient object recognition in cluttered 3D scenes. PAMI 1999.
[3] R.B. Rusu, N. Blodow, M. Beetz. Fast point feature histograms (FPFH) for 3D registration. ICRA 2009.

Download and Description

There are two Matlab .mat files, one for the validation set and one for the test set. Download links:

validation-set.mat (1.8 GB)

test-set.mat (1.8 GB)

The validation set contains 10,000 pairs of RGB-D patches and their ground truth correspondence labels (binary 1 for match and 0 for non-match). The test dataset contains similar data for another 10,000 pairs, except the ground truth correspondence labels have been left out. Each Matlab .mat file contains the variables:

data - a 10,000x2 cell array of structs. Each struct contains the 2D/3D patch data of an interest point, with the variables:

• framePath - path to the scene, sequence, and RGB-D frame from which the patch data was extracted

• pixelCoords - 1x2 array with the pixel coordinates of the interest point on the RGB-D frame

• camCoords - 3x1 array with the 3D camera coordinates of the interest point

• bboxCornersCam - 3x8 matrix with the 3D camera coordinates of the 0.3m³ bounding box around the interest point

• bboxRangePixels - 2x2 matrix where each row is the pixel corner of the projected bounding box on the image plane

• camK - 3x3 matrix of the camera intrinsics

• colorPatch - HxWx3 uint8 matrix of the RGB patch around the interest point

• depthPatch - HxW matrix of the depth patch (in meters) around the interest point

• voxelGridTDF - 30x30x30 matrix of TDF voxel grid values (voxel size is 0.01m) around the interest point

labels - (in validation-set.mat only) a 10,000x1 cell array of binary correspondence labels (1 for match, 0 for non-match) for each pair (row) of interest points saved in data.

Update (as of Mar 2018): for convenience, feel free to download the labels for the test set here.

We do not provide a fixed training set, however we do provide the C++/CUDA code here that we used to sample training correspondences on-the-fly to train 3DMatch. You can also generate your own fixed training set by following the instructions here (see makeCorresDataset.m) with the training scenes instead of the testing scenes.

Evaluation

To evaluate on this benchmark, your descriptor algorithm should output a .log file where each row is the descriptor distance (or for some algorithms, the confidence of non-correspondence) between each pair of patches. Our Github toolbox contains an example .log file (for 3DMatch), as well as an example evaluation script for the validation set. Error is computed as false positive rate at 95% recall.

Geometric Registration Benchmark

Similar in spirit to the registration benchmark from Robust Reconstruction of Indoor Scenes, this benchmark evaluates the performance of geometric registration algorithms in the context of scene reconstruction. However, in contrast to prior work, this benchmark uses real-world RGB-D scanning data instead of synthetic data, in order to promote registration algorithms that are robust to depth data from modern commodity range sensors (e.g. Microsoft Kinect, Intel RealSense).

Benchmark Leaderboard

Method	Recall	Precision
3DMatch [1]	66.8%	40.1%
Spin-Images [2]	51.8%	31.6%
FPFH [3]	44.2%	30.7%

To add your results to the leaderboard, please email us at andyz[at]princeton[dot]edu

[1] A. Zeng, S. Song, M. Nießner, M. Fisher, J. Xiao, T. Funkhouser. 3DMatch: Learning Local Geometric Descriptors from 3D Reconstructions.
[2] A.E. Johnson, M. Hebert. Using spin images for efficient object recognition in cluttered 3D scenes. PAMI 1999.
[3] R.B. Rusu, N. Blodow, M. Beetz. Fast point feature histograms (FPFH) for 3D registration. ICRA 2009.

Downloads

This benchmark contains eight sets of scene fragments created from our testing split of the RGB-D reconstruction datasets. These fragments are available for download in the links below. Each fragment is a 3D point cloud of a surface, integrated from 50 depth frames using TSDF volumetric fusion, and saved to a .ply file. We also provide the fusion code to generate these fragments here.

Dataset	Scene	Fragments	Evaluation Files
7-Scenes	redkitchen	.zip (40 MB)	.zip (1 MB)
SUN3D	home_at/home_at_scan1_2013_jan_1	.zip (44 MB)	.zip (1 MB)
SUN3D	home_md/home_md_scan9_2012_sep_30	.zip (36 MB)	.zip (1 MB)
SUN3D	hotel_uc/scan3	.zip (55 MB)	.zip (1 MB)
SUN3D	hotel_umd/maryland_hotel1	.zip (51 MB)	.zip (1 MB)
SUN3D	hotel_umd/maryland_hotel3	.zip (33 MB)	.zip (1 MB)
SUN3D	mit_76_studyroom/76-1studyroom2	.zip (82 MB)	.zip (1 MB)
SUN3D	mit_lab_hj/lab_hj_tea_nov_2_2012_scan1_erika	.zip (42 MB)	.zip (1 MB)

Evaluation

For evaluation, your geometric registration algorithm should determine whether each non-consecutive pair of fragments can be aligned, and if so, output the predicted rigid transformation to a log file. The format of this log file, and more information about the evaluation protocol, are described here. To compute precision and recall from your algorithm's log files, use the evaluation code here (see Matlab script evaluation/geometric-registration/evaluate.m), or refer to this. We particularly seek registration methods that align fragments without requiring an intial alignment.

Geometric Registration on Synthetic Data

From: S. Choi, Q.Y. Zhou, V. Koltun. Robust Reconstruction of Indoor Scenes. CVPR 2015.

In our paper, we also report the results of 3DMatch's performance over the original synthetic benchmark from Robust Reconstruction of Indoor Scenes, where we achieve 65.1% recall and 25.2% precision. To reproduce our results, you can download the evaluation files for that here:

Code to run 3DMatch on both benchmarks can be found here. We also provide several files with intermediate data generated by our geometric registration pipeline for 3DMatch (as well as the other descriptors we compared to), such as TDF voxel grids for all fragments, keypoints, and descriptor vectors. This can be useful if you wish to improve geometric registration results by designing better search algorithms (RANSAC variants). You can download them here:

intermediate files for real data benchmark (2.9 GB)

intermediate files for synthetic data benchmark (2.5 GB)

RGB-D Reconstruction Datasets

We use several existing RGB-D reconstruction datasets to train 3DMatch and generate evaluation benchmarks. For ease-of-use and compatibility with our code (see Github), we've converted these datasets into a unified file structure and format, available for download in the links below. If you find any of these datasets useful, please cite their original paper(s):

SUN3D: Xiao et al. SUN3D: A Database of Big Spaces Reconstructed using SfM and Object Labels.
7-Scenes: J. Shotton et al. Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images.
RGB-D Scenes v2: Lai et al. Unsupervised Feature Learning for 3D Scene Labeling.
BundleFusion: Dai et al. BundleFusion: Real-time Globally Consistent 3D Reconstruction using Online Surface Re-integration.
Analysis by Synthesis: Valentin et al. Learning to Navigate the Energy Landscape.
Reconstruction algorithm for SUN3D: Halber et al. Fine-To-Coarse Global Registration of RGB-D Scans.

Bash script to download all scenes: download.sh
Training and testing scenes split: split.txt

Dataset	Scene	RGB-D Data & Poses
SUN3D	brown_bm_1/brown_bm_1	.zip (3.1 GB)
SUN3D	brown_bm_4/brown_bm_4	.zip (1.5 GB)
SUN3D	brown_cogsci_1/brown_cogsci_1	.zip (1.3 GB)
SUN3D	brown_cs_2/brown_cs2	.zip (2.4 GB)
SUN3D	brown_cs_3/brown_cs3	.zip (1.6 GB)
SUN3D	harvard_c3/hv_c3_1	.zip (928 MB)
SUN3D	harvard_c5/hv_c5_1	.zip (939 MB)
SUN3D	harvard_c6/hv_c6_1	.zip (698 MB)
SUN3D	harvard_c8/hv_c8_3	.zip (462 MB)
SUN3D	harvard_c11/hv_c11_2	.zip (416 MB)
SUN3D	home_at/home_at_scan1_2013_jan_1	.zip (7.2 GB)
SUN3D	home_bksh/home_bksh_oct_30_2012_scan2_erika	.zip (7.6 GB)
SUN3D	home_md/home_md_scan9_2012_sep_30	.zip (6.6 GB)
SUN3D	hotel_nips2012/nips_4	.zip (3.2 GB)
SUN3D	hotel_sf/scan1	.zip (4.9 GB)
SUN3D	hotel_uc/scan3	.zip (4.9 GB)
SUN3D	hotel_umd/maryland_hotel1	.zip (2.6 GB)
SUN3D	hotel_umd/maryland_hotel3	.zip (853 MB)
SUN3D	mit_32_d507/d507_2	.zip (2.8 GB)
SUN3D	mit_46_ted_lab1/ted_lab_2	.zip (4.7 GB)
SUN3D	mit_76_417/76-417b	.zip (5.7 GB)
SUN3D	mit_76_studyroom/76-1studyroom2	.zip (1.5 GB)
SUN3D	mit_dorm_next_sj/dorm_next_sj_oct_30_2012_scan1_erika	.zip (1.4 GB)
SUN3D	mit_lab_hj/lab_hj_tea_nov_2_2012_scan1_erika	.zip (888 MB)
SUN3D	mit_w20_athena/sc_athena_oct_29_2012_scan1_erika	.zip (4.5 GB)
7-Scenes	chess	.zip (3.1 GB)
7-Scenes	fire	.zip (2.3 GB)
7-Scenes	heads	.zip (956 MB)
7-Scenes	office	.zip (4.7 GB)
7-Scenes	pumpkin	.zip (2.9 GB)
7-Scenes	redkitchen	.zip (6.1 GB)
7-Scenes	stairs	.zip (1.7 GB)
RGB-D Scenes v2	scene_01	.zip (349 MB)
RGB-D Scenes v2	scene_02	.zip (335 MB)
RGB-D Scenes v2	scene_03	.zip (341 MB)
RGB-D Scenes v2	scene_04	.zip (350 MB)
RGB-D Scenes v2	scene_05	.zip (534 MB)
RGB-D Scenes v2	scene_06	.zip (481 MB)
RGB-D Scenes v2	scene_07	.zip (439 MB)
RGB-D Scenes v2	scene_08	.zip (423 MB)
RGB-D Scenes v2	scene_09	.zip (334 MB)
RGB-D Scenes v2	scene_10	.zip (316 MB)
RGB-D Scenes v2	scene_11	.zip (281 MB)
RGB-D Scenes v2	scene_12	.zip (324 MB)
RGB-D Scenes v2	scene_13	.zip (188 MB)
RGB-D Scenes v2	scene_14	.zip (239 MB)
BundleFusion	apt0	.zip (3.6 GB)
BundleFusion	apt1	.zip (3.7 GB)
BundleFusion	apt2	.zip (1.8 GB)
BundleFusion	copyroom	.zip (1.5 GB)
BundleFusion	office0	.zip (2.6 GB)
BundleFusion	office1	.zip (3.0 GB)
BundleFusion	office2	.zip (1.8 GB)
BundleFusion	office3	.zip (1.4 GB)
Analysis by Synthesis	apt1-kitchen (depth)	.zip (88 MB)
Analysis by Synthesis	apt1-living (depth)	.zip (116 MB)
Analysis by Synthesis	apt2-bed (depth)	.zip (75 MB)
Analysis by Synthesis	apt2-kitchen (depth)	.zip (72 MB)
Analysis by Synthesis	apt2-living (depth)	.zip (78 MB)
Analysis by Synthesis	apt2-luke (depth)	.zip (154 MB)
Analysis by Synthesis	office2-5a (depth)	.zip (110 MB)
Analysis by Synthesis	office2-5b (depth)	.zip (127 MB)

Note: SUN3D scenes were reconstructed with Halber et al. Please also cite their paper if you use the SUN3D scenes.

Dataset Format

Each scene is a folder containing one or more RGB-D video sequences. The folder contents are as follows:

camera-intrinsics.txt - a text file with depth camera intrinsics (3x3 matrix in homogeneous coordinates)
seq-XX

• frame-XXXXXX.color.png - a 24-bit PNG RGB color image.

• frame-XXXXXX.depth.png - a 16-bit PNG depth image, aligned to its corresponding color image. Depth is saved in millimeters (mm). Invalid depth is set to 0.

• frame-XXXXXX.pose.txt - a text file with the camera pose of the frame (camera-to-world, 4x4 matrix in homogeneous coordinates and in meters)

License Agreements

7-Scenes: The data is provided for non-commercial use only. License Agreement
BundleFusion: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License
Analysis by Synthesis: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License

Page last updated: 10-Mar-2017
Posted by: Andy Zeng