VGGT → Open3D → BUFFER-X → ICP: Full Registration Pipeline

7 minute read

Published:

This post documents the full 4-stage registration pipeline developed as part of the Path Matters project at TU Berlin. Starting from a VGGT reconstruction, we manually clean and align the point cloud using Open3D, then run BUFFER-X for coarse global registration, and finally refine with ICP — producing quantitative alignment metrics at every stage.


Overview

A key challenge in robotic 3D reconstruction is aligning a reconstructed point cloud to a ground truth model for quality evaluation. This pipeline solves that end-to-end:

The 4-stage pipeline:

Raw reconstruction → Manual crop + alignment → BUFFER-X initial alignment → ICP final refinement → Metrics

Test object: Air conditioner control unit — real-world scan from our Reallife Dataset


System Setup

ComponentDetail
ReconstructionVGGT (Visual Geometry Grounded Transformer)
Manual alignmentOpen3D interactive crop + point picking
Initial registrationBUFFER-X (zero-shot, threedmatch model)
Final refinementICP Point-to-Plane
Environmentbufferx_o3d conda environment
DatasetReallife_Dataset — air_conditioner_control_camera3

How to Run

Activate environment

conda activate bufferx_o3d

Run the full pipeline

python3 /home/AP_PathMatters/path_matters/haroun/Pipeline/cc_bufferx_pipeline_package/run_cc_bufferx_pipeline.py \
  --recon-root /home/AP_PathMatters/path_matters/datasets/Reallife_Dataset_Haroun_Aziz/scenes-others_SUBSAMPLED \
  --gt-root /home/AP_PathMatters/path_matters/datasets/Reallife_Dataset_Haroun_Aziz/scenes-others_SUBSAMPLED \
  --output-base /home/AP_PathMatters/path_matters/haroun/runs \
  --run-name test_bufferx_before_icp \
  --bufferx-root /home/AP_PathMatters/BUFFER-X \
  --bufferx-env bufferx_o3d \
  --scene-names air_conditioner_control_camera3 \
  --recon-candidates recon_generated/vggt/points.ply \
  --manual-mode require \
  --manual-backend open3d \
  --open3d-crop \
  --open3d-pre-scale aabb_diag \
  --open3d-with-scaling \
  --save-viz \
  --show-final-viz

Stage 1 — Raw Overlay

The raw VGGT reconstruction and ground truth are loaded and overlaid before any alignment. The two clouds are in completely different coordinate frames at this stage.

Stage 1: Raw overlay — reconstruction (yellow) vs ground truth (cyan), unaligned


Stage 2 — Manual Open3D Alignment

Before BUFFER-X runs, the reconstruction is manually cleaned and coarsely aligned using Open3D’s interactive tools.

Step 1 — Crop window

The Open3D crop window removes background clutter from the reconstruction, isolating only the target object.

Controls used:

  • Y twice — align view
  • K — enter selection mode
  • Drag mouse — rectangle selection around object
  • C — crop
  • Q — continue

Step 2 — Point picking

After cropping, at least 3 corresponding points are picked manually between source and target to compute an initial transform.

Controls used:

  • Shift + Left Click — pick point
  • Shift + Right Click — undo
  • Minimum 3 points in source, then 3 matching points in target
  • Q — continue

Stage 2: After manual crop and Open3D point-picking alignment

Key setting — pre-scale: --open3d-pre-scale aabb_diag automatically scales the reconstruction to roughly match the ground truth bounding box diagonal before point picking — critical when VGGT output scale differs from real-world scale.


Stage 3 — BUFFER-X Initial Alignment

After manual alignment, BUFFER-X performs global registration to refine the coarse initial pose into a reliable starting point for ICP.

Stage 3: After BUFFER-X global registration

BUFFER-X Results

MetricValue
Source points28,032
Target points30,000
Voxel size0.0051 m
Sphericity0.0246
Inference time~0.51 s
Modelthreedmatch (zero-shot)

Sphericity of 0.025 confirms the object has strong geometric structure — low sphericity means the point cloud has distinctive directional features, making descriptor matching reliable.


Stage 4 — ICP Final Refinement

BUFFER-X output is used as the initial transform for ICP, which iteratively refines the alignment to sub-centimeter precision.

Stage 4: Final result after ICP Point-to-Plane refinement

ICP Results

MetricValueInterpretation
ICP Fitness0.81781.7% of points matched ✅
ICP Inlier RMSE0.0287 m2.87 cm on matched points
Overall RMSE0.051 m5.1 cm including outliers
Median distance0.026 m2.6 cm — half of points within this
P90 distance0.067 m90% of points within 6.7 cm
P95 distance0.090 m95% of points within 9.0 cm
Max distance0.284 mWorst-case outlier
ICP modePoint-to-PlaneStandard for smooth surfaces
Point count28,555Total evaluated correspondences

What these numbers mean

Fitness of 0.817 is a strong result — 81.7% of reconstruction points found a valid match in the ground truth model. This is significantly better than our earlier Baby Yoda test (0.33) because the background was properly removed before registration.

Inlier RMSE of 2.87 cm represents the average error on matched points — comparable to the BUFFER-X 3DMatch benchmark result of 5.79 cm RTE, confirming the pipeline generalizes well to real-world industrial objects.

Median distance of 2.6 cm means half of all reconstruction points lie within 2.6 cm of the ground truth surface — good accuracy for a real-world scan of an industrial object.


All 4 Stages Side by Side

1. Raw — click to enlarge

2. Manual — click to enlarge

3. BUFFER-X — click to enlarge

4. ICP Final — click to enlarge


Pipeline Output Structure

After the run, all outputs are saved automatically:

runs/test_bufferx_before_icp/air_conditioner_control_camera3/
├── raw/
│   ├── recon_input.ply       ← original VGGT reconstruction
│   └── gt_input.ply          ← ground truth model
├── manual/
│   ├── recon_cropped.ply     ← after Open3D crop
│   └── recon_manual.ply      ← after point-picking alignment
├── bufferx/
│   ├── init_transform.txt    ← BUFFER-X 4x4 transform matrix
│   └── init_summary.json     ← BUFFER-X metrics
├── icp/
│   ├── icp_transform.txt     ← ICP 4x4 transform matrix
│   ├── icp_summary.json      ← ICP metrics
│   └── aligned_source_icp.ply← final aligned reconstruction
├── viz/
│   ├── 01_raw_overlay.png
│   ├── 02_manual_overlay.png
│   ├── 03_bufferx_overlay.png
│   └── 04_icp_overlay.png
└── scene_status.json         ← overall run status

Check run status:

cat runs/test_bufferx_before_icp/air_conditioner_control_camera3/scene_status.json

Why BUFFER-X Before ICP Matters

ApproachICP Result
ICP alone (no init)Diverges — wrong local minimum
Manual init onlyBetter but still rough
BUFFER-X init → ICPFitness 0.817, RMSE 2.87 cm ✅

BUFFER-X provides a reliable coarse alignment that puts ICP in the correct convergence basin. Without it, ICP gets stuck in the wrong local minimum and produces garbage results regardless of how many iterations it runs.


Challenges and Solutions

Challenge 1 — Scale mismatch VGGT reconstruction scale differs from real-world ground truth scale. Solution: --open3d-pre-scale aabb_diag automatically estimates and applies a scale correction before point picking.

Challenge 2 — Background clutter Raw VGGT output includes table, walls, and surrounding environment. Solution: Open3D interactive crop isolates only the target object.

Challenge 3 — Coordinate frame mismatch VGGT and ground truth use different coordinate conventions. Solution: Manual point picking establishes 3D correspondences that BUFFER-X uses to compute the correct initial transform.

Challenge 4 — Low overlap regions Some parts of the air conditioner were not captured from all angles. Solution: P95 metric (9.0 cm) identifies these outlier regions separately from the well-aligned core (median 2.6 cm).


Key Takeaway

The BUFFER-X → ICP pipeline achieves 81.7% fitness and 2.87 cm inlier RMSE on a real-world industrial object — without any domain-specific retraining. The pipeline is generalizable: the same workflow works on Isaac Sim synthetic data and real-world scans.

The ICP Fitness and RMSE scores produced by this pipeline can feed directly into our PPO reinforcement learning reward function, giving the RL agent a quantitative signal for reconstruction quality at each viewpoint.


Next Steps

  • Test on multiple objects from the dataset
  • Compare BUFFER-X → ICP vs ICP-only baseline quantitatively
  • Integrate pipeline output as RL reward signal in Isaac Lab
  • Automate the crop step using SAM3D segmentation masks

Resources


After committing — do these 3 things:

1 — Upload the 4 visualization images to images/: ``` pipeline_01_raw.png ← viz/01_raw_overlay.png pipeline_02_manual.png ← viz/02_manual_overlay.png pipeline_03_bufferx.png ← viz/03_bufferx_overlay.png pipeline_04_icp.png ← viz/04_icp_overlay.png