Taha Mohammed

Path Matters: What We Built, What We Found, and What Comes Next

2026-04-14T00:00:00+02:00

After six months of work, the Path Matters project is submitted. This post is a summary of what we built, the results we got, and what I personally learned from it. The full report is available through TU Berlin.

The project ran as an automation engineering course at TU Berlin (WiSe 25/26), supervised by Adam Altenbuchner at the Institut für Werkzeugmaschinen und Fabrikbetrieb. Five engineers: Artem Balatsiuk, Aziz Louati, Haroun Lallouche, Taha Mohammed (me), and Ziad Abouhalawa. The question we were trying to answer was simple to state and genuinely hard to answer: does the choice and sequence of camera viewpoints significantly affect the quality of 3D reconstruction from a robotic scanner?

What we built

The system is a modular pipeline connecting a UR5e robot arm in NVIDIA Isaac Sim, controlled through ROS2 Humble and MoveIt2, to three reconstruction models (VGGT, Fast3R, SAM3D), and an evaluation pipeline using ICP and BUFFER-X to measure alignment quality against ground-truth meshes. My responsibility was trajectory planning — designing the motion strategies, implementing the ROS2 control pipeline, and running the scanning experiments.

The pipeline has four independent modules that communicate through a shared directory structure: trajectory (viewpoint generation and execution), data (image export and point cloud management), reconstruction (wrapping all three models behind a common interface), and evaluation (ICP registration and metric computation). This design made it possible to swap any component and compare results systematically.

Reconstruction: VGGT wins clearly

We evaluated VGGT, Fast3R, and SAM3D on 28 objects drawn from Isaac Sim synthetic data, Google Scanned Objects, T-LESS and Linemod benchmarks, and real-world hand-held captures. Each method received the same set of 12 images from a fixed hemispheric trajectory, and we measured ICP Fitness and RMSE after registration against the ground-truth mesh.

VGGT: Fitness 0.93, RMSE 0.002 m. Fast3R: Fitness 0.89, RMSE 0.010 m. SAM3D: Fitness 0.91, RMSE 0.008 m. All three finish inference in under 10 seconds on our RTX A6000. VGGT wins on both metrics and became the primary reconstruction backbone for all subsequent experiments.

One practical complication: VGGT and Fast3R produce reconstructions in an arbitrary scene-relative scale with no absolute metric information. Before registration, you need to estimate the scale factor. We used a median consensus approach combining three estimators — bounding box diagonal ratio, PCA axis length ratios, and convex hull volume ratio. In extreme mismatch cases, this improved ICP Fitness from 0.0 to 1.0. Without scale correction, the alignment metrics are meaningless.

Registration: ICP outperforms BUFFER-X here, but context matters

For our evaluation setting — preprocessed point clouds with reasonable initial alignment — ICP with FPFH initialisation outperformed BUFFER-X on all metrics: Fitness 0.87 versus 0.81, RMSE 0.0041 m versus 0.0063 m, recall above 0.8 threshold at 78.6% versus 67.9%. BUFFER-X is faster (1.8 s versus 3.2 s) and does not require a coarse initial alignment, which makes it useful when RANSAC fails. The two methods are complementary rather than competing.

Trajectory: camera orientation is the biggest lever

I ran five scanning patterns (Lawnmower, Zigzag, Hemisphere, Spiral, Random) under two camera orientation strategies. Approach 1: camera always points straight down. Approach 2: camera dynamically points toward the detected object centre at every waypoint.

Approach 1 mean ICP Fitness across all patterns: 0.68. Approach 2 mean: 0.79. The improvement was consistent across every pattern without changing the number of viewpoints or the trajectory geometry. The best single result was Hemisphere with Approach 2: Fitness 0.86, RMSE 0.015 m.

This was the clearest finding of the trajectory experiments: camera orientation matters more than trajectory pattern at this scale. Which pattern you use — lawnmower, hemisphere, spiral — has a secondary effect compared to whether the camera is actually looking at the object from each position. The implementation cost of dynamic object-pointing orientation is modest (slightly more complex IK solutions, around 5% IK failure rate versus 0% for fixed downward, and 0.5 s longer stabilisation pause), and the quality gain is substantial.

Reinforcement learning: proximity shaping solves the sparse reward problem

The RL component uses Isaac Lab with 16 parallel UR5e environments, PPO, and an 86-dimensional state space including joint positions, camera pose, coverage percentage, and an 8x8 downsampled voxel coverage map. The agent selects continuous joint position deltas; images are captured automatically every 20 steps.

First experiment (exp_06): coverage rewards only, no signal toward the scanning volume. Task success rate: 0.4%. The robot occasionally stumbled into good positions by chance but could not reproduce the behaviour.

Second experiment (exp_07): added three proximity shaping rewards providing a continuous gradient toward the workspace — a proximity gradient toward the volume centre, a binary reward for being inside the workspace bounds, and a dot-product reward for facing the volume. Task success rate: 45.2%. Coverage: 75% or more per episode. Versus random exploration: 3.6x higher coverage, 75%+ versus 20.6%.

The practical lesson is that proximity shaping is not a nice-to-have: it is necessary for this environment. Without a gradient guiding the robot toward the scanning volume, the coverage rewards never activate and learning stalls. The agent needs to learn to get close before it can learn to scan.

There is a known limitation: the RL agent maximises geometric voxel coverage (frustum-based), not actual reconstruction quality. Coverage and ICP Fitness are correlated but not identical. The obvious next step — and the open research question the project leaves behind — is to replace the geometric coverage metric with direct ICP Fitness or VGGT confidence as the reward signal.

What I would do differently

The preprocessing pipeline (RANSAC plane removal, DBSCAN clustering) failed on roughly 15% of objects — either removing part of the object or not finding the table plane. These failures propagated into degraded scale estimation and registration. More robust preprocessing, or replacing it with a learned segmentation approach like YOLO-based semantic crop, would have improved the tail of the distribution.

The RL training stability was also not fully resolved. The exp_07 task success rate peaks near 100% around iteration 1500 before settling at 40–45%. For deployment, you would need to select the peak checkpoint rather than the final one. Entropy coefficient scheduling and learning rate decay would likely improve convergence.

What comes next

For me personally, this project confirmed that I want to keep working on robotic perception and learning-based control, specifically the problems around sim-to-real transfer and direct optimisation of reconstruction quality through robot behaviour. The trajectory planning work connects directly to the reward signal design question in the RL component, and both connect to the calibration work I did at Fraunhofer IPK. I am looking for a PhD position where I can develop these threads further.

The full report, code, and results are available on request. If you are working on related problems and want to discuss, feel free to reach out.

VGGT → Open3D → BUFFER-X → ICP: Full Registration Pipeline

2026-03-17T00:00:00+01:00

This post documents the full 4-stage registration pipeline developed as part of the Path Matters project at TU Berlin. Starting from a VGGT reconstruction, we manually clean and align the point cloud using Open3D, then run BUFFER-X for coarse global registration, and finally refine with ICP — producing quantitative alignment metrics at every stage.

Your browser does not support the video tag.

Overview

A key challenge in robotic 3D reconstruction is aligning a reconstructed point cloud to a ground truth model for quality evaluation. This pipeline solves that end-to-end:

The 4-stage pipeline:

Raw reconstruction → Manual crop + alignment → BUFFER-X initial alignment → ICP final refinement → Metrics

Test object: Air conditioner control unit — real-world scan from our Reallife Dataset

System Setup

Component	Detail
Reconstruction	VGGT (Visual Geometry Grounded Transformer)
Manual alignment	Open3D interactive crop + point picking
Initial registration	BUFFER-X (zero-shot, threedmatch model)
Final refinement	ICP Point-to-Plane
Environment	`bufferx_o3d` conda environment
Dataset	Reallife_Dataset — air_conditioner_control_camera3

How to Run

Activate environment

conda activate bufferx_o3d

Run the full pipeline

python3 /home/AP_PathMatters/path_matters/haroun/Pipeline/cc_bufferx_pipeline_package/run_cc_bufferx_pipeline.py \
  --recon-root /home/AP_PathMatters/path_matters/datasets/Reallife_Dataset_Haroun_Aziz/scenes-others_SUBSAMPLED \
  --gt-root /home/AP_PathMatters/path_matters/datasets/Reallife_Dataset_Haroun_Aziz/scenes-others_SUBSAMPLED \
  --output-base /home/AP_PathMatters/path_matters/haroun/runs \
  --run-name test_bufferx_before_icp \
  --bufferx-root /home/AP_PathMatters/BUFFER-X \
  --bufferx-env bufferx_o3d \
  --scene-names air_conditioner_control_camera3 \
  --recon-candidates recon_generated/vggt/points.ply \
  --manual-mode require \
  --manual-backend open3d \
  --open3d-crop \
  --open3d-pre-scale aabb_diag \
  --open3d-with-scaling \
  --save-viz \
  --show-final-viz

Stage 1 — Raw Overlay

The raw VGGT reconstruction and ground truth are loaded and overlaid before any alignment. The two clouds are in completely different coordinate frames at this stage.

Stage 1: Raw overlay — reconstruction (yellow) vs ground truth (cyan), unaligned

Stage 2 — Manual Open3D Alignment

Before BUFFER-X runs, the reconstruction is manually cleaned and coarsely aligned using Open3D’s interactive tools.

Step 1 — Crop window

The Open3D crop window removes background clutter from the reconstruction, isolating only the target object.

Controls used:

Y twice — align view
K — enter selection mode
Drag mouse — rectangle selection around object
C — crop
Q — continue

Step 2 — Point picking

After cropping, at least 3 corresponding points are picked manually between source and target to compute an initial transform.

Controls used:

Shift + Left Click — pick point
Shift + Right Click — undo
Minimum 3 points in source, then 3 matching points in target
Q — continue

Stage 2: After manual crop and Open3D point-picking alignment

Key setting — pre-scale: --open3d-pre-scale aabb_diag automatically scales the reconstruction to roughly match the ground truth bounding box diagonal before point picking — critical when VGGT output scale differs from real-world scale.

Stage 3 — BUFFER-X Initial Alignment

After manual alignment, BUFFER-X performs global registration to refine the coarse initial pose into a reliable starting point for ICP.

Stage 3: After BUFFER-X global registration

BUFFER-X Results

Metric	Value
Source points	28,032
Target points	30,000
Voxel size	0.0051 m
Sphericity	0.0246
Inference time	~0.51 s
Model	threedmatch (zero-shot)

Sphericity of 0.025 confirms the object has strong geometric structure — low sphericity means the point cloud has distinctive directional features, making descriptor matching reliable.

BUFFER-X output is used as the initial transform for ICP, which iteratively refines the alignment to sub-centimeter precision.

Stage 4: Final result after ICP Point-to-Plane refinement

ICP Results

Metric	Value	Interpretation
ICP Fitness	0.817	81.7% of points matched ✅
ICP Inlier RMSE	0.0287 m	2.87 cm on matched points
Overall RMSE	0.051 m	5.1 cm including outliers
Median distance	0.026 m	2.6 cm — half of points within this
P90 distance	0.067 m	90% of points within 6.7 cm
P95 distance	0.090 m	95% of points within 9.0 cm
Max distance	0.284 m	Worst-case outlier
ICP mode	Point-to-Plane	Standard for smooth surfaces
Point count	28,555	Total evaluated correspondences

What these numbers mean

Fitness of 0.817 is a strong result — 81.7% of reconstruction points found a valid match in the ground truth model. This is significantly better than our earlier Baby Yoda test (0.33) because the background was properly removed before registration.

Inlier RMSE of 2.87 cm represents the average error on matched points — comparable to the BUFFER-X 3DMatch benchmark result of 5.79 cm RTE, confirming the pipeline generalizes well to real-world industrial objects.

Median distance of 2.6 cm means half of all reconstruction points lie within 2.6 cm of the ground truth surface — good accuracy for a real-world scan of an industrial object.

All 4 Stages Side by Side

1. Raw — click to enlarge

2. Manual — click to enlarge

3. BUFFER-X — click to enlarge

4. ICP Final — click to enlarge

Pipeline Output Structure

After the run, all outputs are saved automatically:

runs/test_bufferx_before_icp/air_conditioner_control_camera3/
├── raw/
│   ├── recon_input.ply       ← original VGGT reconstruction
│   └── gt_input.ply          ← ground truth model
├── manual/
│   ├── recon_cropped.ply     ← after Open3D crop
│   └── recon_manual.ply      ← after point-picking alignment
├── bufferx/
│   ├── init_transform.txt    ← BUFFER-X 4x4 transform matrix
│   └── init_summary.json     ← BUFFER-X metrics
├── icp/
│   ├── icp_transform.txt     ← ICP 4x4 transform matrix
│   ├── icp_summary.json      ← ICP metrics
│   └── aligned_source_icp.ply← final aligned reconstruction
├── viz/
│   ├── 01_raw_overlay.png
│   ├── 02_manual_overlay.png
│   ├── 03_bufferx_overlay.png
│   └── 04_icp_overlay.png
└── scene_status.json         ← overall run status

Check run status:

cat runs/test_bufferx_before_icp/air_conditioner_control_camera3/scene_status.json

Why BUFFER-X Before ICP Matters

Approach	ICP Result
ICP alone (no init)	Diverges — wrong local minimum
Manual init only	Better but still rough
BUFFER-X init → ICP	Fitness 0.817, RMSE 2.87 cm ✅

BUFFER-X provides a reliable coarse alignment that puts ICP in the correct convergence basin. Without it, ICP gets stuck in the wrong local minimum and produces garbage results regardless of how many iterations it runs.

Challenges and Solutions

Challenge 1 — Scale mismatch VGGT reconstruction scale differs from real-world ground truth scale. Solution: --open3d-pre-scale aabb_diag automatically estimates and applies a scale correction before point picking.

Challenge 2 — Background clutter Raw VGGT output includes table, walls, and surrounding environment. Solution: Open3D interactive crop isolates only the target object.

Challenge 3 — Coordinate frame mismatch VGGT and ground truth use different coordinate conventions. Solution: Manual point picking establishes 3D correspondences that BUFFER-X uses to compute the correct initial transform.

Challenge 4 — Low overlap regions Some parts of the air conditioner were not captured from all angles. Solution: P95 metric (9.0 cm) identifies these outlier regions separately from the well-aligned core (median 2.6 cm).

Key Takeaway

The BUFFER-X → ICP pipeline achieves 81.7% fitness and 2.87 cm inlier RMSE on a real-world industrial object — without any domain-specific retraining. The pipeline is generalizable: the same workflow works on Isaac Sim synthetic data and real-world scans.

The ICP Fitness and RMSE scores produced by this pipeline can feed directly into our PPO reinforcement learning reward function, giving the RL agent a quantitative signal for reconstruction quality at each viewpoint.

Next Steps

Test on multiple objects from the dataset
Compare BUFFER-X → ICP vs ICP-only baseline quantitatively
Integrate pipeline output as RL reward signal in Isaac Lab
Automate the crop step using SAM3D segmentation masks

Resources

After committing — do these 3 things:

1 — Upload the 4 visualization images to images/: ``` pipeline_01_raw.png ← viz/01_raw_overlay.png pipeline_02_manual.png ← viz/02_manual_overlay.png pipeline_03_bufferx.png ← viz/03_bufferx_overlay.png pipeline_04_icp.png ← viz/04_icp_overlay.png

BUFFER-X: Zero-Shot Point Cloud Registration in Isaac Sim

2026-03-16T00:00:00+01:00

BUFFER-X Point Cloud Registration

As part of the Path Matters project at TU Berlin, this week I integrated BUFFER-X into our robotic 2D→3D reconstruction pipeline. BUFFER-X is a zero-shot point cloud registration method published as an ICCV 2025 Highlight paper by MIT SPARK Lab. It aligns two 3D point clouds into the same coordinate frame without any retraining or fine-tuning — across any scene or sensor type.

Video Walkthrough

The Problem We Were Solving

Our pipeline uses a UR5e robot arm with a simulated Basler camera in NVIDIA Isaac Sim to capture multiple views of an object and reconstruct it in 3D. The reconstruction methods we tested — VGGT, Fast3R, and SAM3D — each produce point clouds in their own coordinate frames.

The challenge: ICP (Iterative Closest Point) alignment was failing because it had no reliable initial pose to start from. Without a good starting point, ICP diverges and gives wrong results.

BUFFER-X solves this by providing a robust initial alignment before ICP runs.

Where BUFFER-X Fits in Our Pipeline

Isaac Sim Scene → Basler Camera Capture → .PLY Export + Ground Truth Pose → 2D→3D Reconstruction → Preprocessing → BUFFER-X Initial Alignment → ICP Refinement → Fitness / RMSE Score → RL Reward Signal

What is BUFFER-X?

BUFFER-X (Balanced Unified Feature-based Framework for Extended Registration) works in two stages:

Descriptor stage — extracts local geometric features using a PointNet++ backbone with multi-scale grouping
Pose estimation stage — uses RANSAC or KISS-Matcher to find the optimal 6-DoF rigid transformation T ∈ SE(3)

Key facts:

Only 0.91M trainable parameters — extremely lightweight
~1 second per point cloud pair
Trained once on indoor RGB-D data — works on outdoor LiDAR, synthetic Isaac Sim data, and more
ICCV 2025 Highlight · MIT SPARK Lab

Benchmark Results — 3DMatch Indoor Dataset

I ran the full 3DMatch benchmark (1623 point cloud pairs) on our machine with an NVIDIA RTX A6000.

Metric	Result
Recall	97.1%
RMSE Recall	95.2%
RTE	5.79 cm
RRE	1.80°
Failed pairs	47 / 1623
Inference time	~0.89 s per pair
Total runtime	28 minutes

Why did 47 pairs fail?

1. Symmetric scenes (majority of failures) These have RRE near 90°, 125°, or exactly 180° — the scene geometry is rotationally symmetric (empty corridors, white walls, repetitive patterns). No registration method can reliably distinguish these orientations.

2. Low-overlap pairs The two scans share very little overlapping geometry. With insufficient matching surface, no reliable transformation can be estimated.

These are fundamental data challenges, not model failures.

Custom Test — Isaac Sim Basler Camera

I tested BUFFER-X on our own dataset: a Baby Yoda figurine scanned in Isaac Sim using a simulated Basler camera.

Setup:

Ground truth: Baby_Yoda.ply — 10,000 points, clean object, no color
Reconstruction: points.ply — 100,000 points, full scene with RGB color

Results:

Metric	Value
Fitness	0.3342
RMSE	0.0287 m (2.87 cm)

Before Alignment

After Alignment

Why is Fitness 0.33?

The Fitness score of 0.33 means 33% of points were matched — which looks low but has a clear explanation: points.ply contains 100,000 points including the full Isaac Sim scene background (table, walls, floor), while Baby_Yoda.ply is a clean 10,000-point isolated object model.

The background clutter counts as unmatched points, pulling Fitness down. The RMSE of 2.87 cm on matched points shows the object itself aligned correctly — confirmed visually in the After image above.

Next step: run our preprocessing pipeline (background removal, outlier filtering) on points.ply before comparison.

Key Takeaway

BUFFER-X gives ICP a reliable initial pose — without it, ICP was diverging on our Isaac Sim data. The zero-shot capability means it works on our completely unseen Isaac Sim Basler camera data without any retraining, which is exactly what our pipeline needs.

The Fitness and RMSE scores output by BUFFER-X can also feed directly into our PPO reinforcement learning reward function — replacing the current geometric heuristics with a direct alignment quality signal.

Tools & Setup

Component	Version
OS	Ubuntu 22.04 LTS
Python	3.8
PyTorch	1.9.1+cu111
CUDA	11.1
GPU	NVIDIA RTX A6000 (49GB)
Open3D	0.13.0

Resources

UR5e Multi-View Trajectory Planning for 3D Reconstruction in Isaac Sim

2026-03-16T00:00:00+01:00

As part of the Path Matters project at TU Berlin, this post documents the first complete pipeline run: a UR5e robot arm executes a planned multi-view camera trajectory in Isaac Sim, captures images from 7 viewpoints, and saves them for 3D reconstruction. The system runs entirely in simulation using ROS2, MoveIt2 and Isaac Sim.

Overview

The core question of our Path Matters project is:

How do different viewpoint sequences and trajectory strategies affect 3D reconstruction quality, completeness and efficiency?

This post documents the first building block — getting the UR5e to autonomously move to planned viewpoints and capture images. Everything runs in simulation before transferring to the real robot.

System Architecture

Three components run simultaneously, each in its own terminal:

Component	Tool	Purpose
Simulation	NVIDIA Isaac Sim	Physics + robot + camera
Motion Planning	ROS2 Humble + MoveIt2	IK solving + collision checking
Trajectory Control	Python (ROS2 node)	Waypoints + image capture
Robot	UR5e collaborative arm	6-DOF manipulation
Camera	Simulated Basler RGB-D	Image acquisition
Scene	17_12_robot_plane_graph.usd	Environment + object

How to Run — Step by Step

Terminal 1 — Launch Isaac Sim

Open a terminal and run:

conda deactivate
cd isaacsim/_build/linux-x86_64/release
./isaac-sim.sh

Open scene: 17_12_robot_plane_graph.usd and wait for full load.

Terminal 2 — Launch ROS2 + MoveIt2

Open a second terminal and run:

conda deactivate
ros2 launch ur_moveit_config ur_moveit.launch.py ur_type:=ur5e

Wait until you see MoveIt running in the output.

Terminal 3 — Run Trajectory Script

Open a third terminal and run:

conda deactivate
python3 /home/AP_PathMatters/path_matters/trajectory/scripts/26_01_ur_move.py

The robot begins moving through all 7 viewpoints automatically.

The Trajectory Script — How It Works

The script is a ROS2 node called URPhotoCapture that does four things:

1. Connects to MoveIt2 IK service to convert Cartesian (x,y,z) positions into joint angles for the UR5e.

2. Subscribes to the camera topic /camera/image_raw to receive live images from the simulated Basler camera in Isaac Sim.

3. Uses smooth cubic interpolation between waypoints — cubic easing prevents jerky motion which is important for blur-free image capture.

4. Captures and saves images at each position as .ppm files with timestamp and position name.

Viewpoint Design

The 7 viewpoints are arranged around the target object located at approximately (0.3, 0.0, 0.2) in the world frame:

#	Name	X	Y	Z	Purpose
1	Top View	0.30	0.00	0.60	Top-down coverage
2	Front	0.15	0.00	0.30	Front face
3	Right	0.30	0.20	0.30	Right side
4	Back	0.50	0.00	0.30	Back face
5	Left	0.30	-0.20	0.30	Left side
6	Front-Angled	0.20	0.00	0.45	45 degree front angle
7	Right-Angled	0.30	0.15	0.45	45 degree right angle

All positions use camera pointing downward toward the object.

Inverse Kinematics — Key Design Decision

A critical insight from debugging: the IK solver requires the world frame, not base_link or other frames. This was discovered through systematic diagnostic testing — using any other frame caused IK failures across all positions.

IK parameters used:

Group name: ur_manipulator
IK link: tool0
Timeout: 5 seconds
Collision avoidance: disabled for testing

Results

Captured Images

Top View

Front View

Right Side

Back View

Metric	Value
Viewpoints planned	7
Images captured	4
Image format	.ppm (RGB8)
Save location	`/path_matters/trajectory/captures/`

Challenges and Solutions

Challenge 1 — IK frame mismatch Initial attempts used base_link frame and all IK calls failed. Solution: systematic diagnostic confirmed world frame is correct.

Challenge 2 — Jerky robot motion Direct joint jumps caused unrealistic motion and camera blur. Solution: cubic ease in-out interpolation over 30 to 40 steps.

Challenge 3 — Camera timing Moving too fast meant camera image had not updated before capture. Solution: 0.8 second delay after reaching each position before capture.

Challenge 4 — Shared PC resources VGGT Gradio demo running in background consumed 5GB+ GPU memory. Solution: coordinate with teammates to free GPU before Isaac Sim runs.

Full Pipeline — What Comes Next

Step 1: Trajectory and Image Capture — THIS POST
Step 2: 2D to 3D Reconstruction with VGGT or Fast3R
Step 3: Preprocessing to remove background
Step 4: BUFFER-X Initial Alignment
Step 5: ICP Refinement
Step 6: Fitness and RMSE as RL Reward Signal
Step 7: PPO Training to optimize trajectory

Resources

Vision-Based TCP Calibration for Collaborative Robots Using Deep Learning

2025-03-07T00:00:00+01:00

An intelligent vision-based calibration system developed at Fraunhofer IPK and TU Berlin that reduces industrial robot TCP calibration time by 87.5% while improving accuracy by 76% through deep learning-driven pose selection.

1. Project Overview

One-sentence summary: An intelligent vision-based calibration system that reduces industrial robot TCP calibration time by 87.5% while improving accuracy by 76% through deep learning-driven pose selection.

Institution: Fraunhofer Institute for Production Systems and Design Technology (IPK), TU Berlin

Duration: 2024-2025

2. Problem Statement

Traditional robot Tool Center Point (TCP) calibration requires 40+ calibration poses and takes ~80 minutes to complete. This is time-consuming in production environments, requires significant operator expertise, and is not optimized for data efficiency.

No existing work systematically analyzed which calibration poses actually contribute to accuracy. The assumption was “more data = better calibration.”

3. Methods and Tools Used

Hardware

Component	Detail
Robot	Universal Robots UR5e collaborative arm
Camera	Azure Kinect RGB-D (depth + color)
Compute	NVIDIA Jetson Orin NX (embedded deployment)
Target	ArUco marker board

Software Stack

Tool	Purpose
ROS2 Humble	Sensor integration, robot control
PyTorch 2.0	Deep learning framework
ResNet-18	CNN architecture for pose quality prediction
OpenCV 4.5	Image processing, ArUco detection
NumPy / SciPy	Numerical computation
Docker	Containerized deployment
Python 3.10	Primary development language

Algorithms

Kinematic calibration: Hand-eye calibration (Tsai-Lenz method)
Optimization: Levenberg-Marquardt for pose refinement
State estimation: RGB-D + kinematics + visual odometry fusion
Pose selection: CNN-based quality scoring

4. System Architecture

Hardware Setup

Azure Kinect RGB-D Camera
         ↓ (USB 3.0)
NVIDIA Jetson Orin NX
         ↓ (Ethernet)
     UR5e Robot Arm
         ↓
   ArUco Marker Board

Software Pipeline

RGB-D Image Acquisition (ROS2 node)
ArUco Marker Detection (OpenCV)
Pose Estimation (PnP algorithm)
CNN Quality Scoring (PyTorch)
Intelligent Pose Selection (top-5 poses)
Kinematic Calibration (hand-eye solver)
TCP Parameter Output

5. What I Implemented

Core Modules

thesis_ws/
├── src/
│   ├── camera_driver/
│   │   └── kinect_node.py
│   ├── pose_estimation/
│   │   ├── aruco_detector.py
│   │   └── pose_solver.py
│   ├── calibration/
│   │   ├── cnn_model.py
│   │   ├── pose_selector.py
│   │   ├── hand_eye_calib.py
│   │   └── optimizer.py
│   ├── robot_control/
│   │   └── ur5e_controller.py
│   └── evaluation/
│       ├── metrics.py
│       └── visualization.py

Key Contributions

CNN training pipeline for pose quality prediction
Hardware-in-the-Loop validation framework
Real-time sensor fusion (RGB-D + kinematics + odometry)
Automated calibration workflow with zero human intervention

6. How to Run

Terminal 1 — Launch camera driver

ros2 launch camera_driver kinect.launch.py

Terminal 2 — Launch robot controller

ros2 launch robot_control ur5e_bringup.launch.py

Terminal 3 — Run calibration system

ros2 launch calibration calibration_system.launch.py \
    --poses 5 \
    --model weights/resnet18_best.pth

Evaluate results

python3 evaluation/metrics.py \
    --data logs/calibration_results.csv \
    --ground-truth data/ground_truth.yaml

7. Results and Metrics

Metric	Conventional (40 poses)	This Work (5 poses)	Improvement
Calibration Time	80 minutes	10 minutes	87.5% faster
RMS Accuracy	20.43 mm	11.82 mm	76% better
Number of Poses	40	5	87.5% fewer
Repeatability	3.2 mm	1.8 mm	44% more stable

Key Findings

5 intelligently selected poses outperform 40 random poses
Pose quality matters more than pose quantity
CNN can predict calibration quality from RGB-D data alone
System runs in real-time on embedded hardware (Jetson Orin NX)

8. Challenges and Solutions

Challenge 1 — Real-time performance on Jetson CNN inference was 250ms — too slow for closed-loop control requiring under 100ms. Solution: TensorRT optimized model + reduced input resolution → 45ms inference time.

Challenge 2 — Sensor synchronization RGB-D camera, robot encoders, and visual odometry had different update rates. Solution: ROS2 time-synchronized message filter with 50ms tolerance.

Challenge 3 — Marker detection under poor lighting ArUco detection failed in low-light or high-glare conditions. Solution: Adaptive histogram equalization + multi-scale detection → robustness improved from 78% to 96%.

Key insight: Pose diversity (spatial distribution) matters more than pose quantity. The CNN learned to penalize redundant poses and reward geometrically diverse ones.

9. Key Takeaway

Proving that intelligent data selection outperforms brute-force data collection in robotic calibration. This challenges the conventional “more data = better” paradigm and shows that asking the right question — which poses matter? — is more valuable than raw computational power.

This methodology transfers to other robotics applications requiring calibration, teaching-by-demonstration, or data-efficient learning.

10. Next Steps

Test on other robot platforms (KUKA, ABB, Fanuc)
Active learning: robot automatically selects next best pose
Transfer learning: CNN pretrained on one robot generalizes to others
Online recalibration: detect calibration drift and auto-correct
Paper submitted to IEEE/RSJ IROS 2026 (under review)

Taha Mohammed

Path Matters: What We Built, What We Found, and What Comes Next

VGGT → Open3D → BUFFER-X → ICP: Full Registration Pipeline

Overview

System Setup

How to Run

Activate environment

Run the full pipeline

Stage 1 — Raw Overlay

Stage 2 — Manual Open3D Alignment

Step 1 — Crop window

Step 2 — Point picking

Stage 3 — BUFFER-X Initial Alignment

BUFFER-X Results

Stage 4 — ICP Final Refinement

ICP Results

What these numbers mean

All 4 Stages Side by Side

Pipeline Output Structure

Why BUFFER-X Before ICP Matters

Challenges and Solutions

Key Takeaway

Next Steps

Resources

BUFFER-X: Zero-Shot Point Cloud Registration in Isaac Sim

Video Walkthrough

The Problem We Were Solving

Where BUFFER-X Fits in Our Pipeline

What is BUFFER-X?

Benchmark Results — 3DMatch Indoor Dataset

Why did 47 pairs fail?

Custom Test — Isaac Sim Basler Camera

Before Alignment

After Alignment

Why is Fitness 0.33?

Key Takeaway

Tools & Setup

Resources

UR5e Multi-View Trajectory Planning for 3D Reconstruction in Isaac Sim

Overview

System Architecture

How to Run — Step by Step

Terminal 1 — Launch Isaac Sim

Terminal 2 — Launch ROS2 + MoveIt2

Terminal 3 — Run Trajectory Script

The Trajectory Script — How It Works

Viewpoint Design

Inverse Kinematics — Key Design Decision

Results

Captured Images

Challenges and Solutions

Full Pipeline — What Comes Next

Resources

Vision-Based TCP Calibration for Collaborative Robots Using Deep Learning

1. Project Overview

2. Problem Statement

3. Methods and Tools Used

Hardware

Software Stack

Algorithms

4. System Architecture

Hardware Setup

Software Pipeline

5. What I Implemented

Core Modules

Key Contributions

6. How to Run

Terminal 1 — Launch camera driver

Terminal 2 — Launch robot controller

Terminal 3 — Run calibration system

Evaluate results

7. Results and Metrics

Key Findings

8. Challenges and Solutions

9. Key Takeaway

10. Next Steps

Resources