Saravut Lin·MSc AI, University of Edinburgh
MDPI AI 2026

Deep Learning · 3D Point Cloud Segmentation

Single-Object
Segmentation
in Point Clouds

Five deep learning architectures — from PointNet to Stratified Transformer — evaluated on a real-world robotic grasping task. Which one actually works when the robot needs to pick up a jar of jam?

01 PointNet02 PointNet++03 DGCNN04 PointWeb05 Stratified TransformerScroll to explore

Author

Saravut Lin

Institution

University of Edinburgh

Degree

MSc Artificial Intelligence

Published

MDPI AI, Vol. 7, 2026

Five Architectures

The Evolution of Point Cloud Learning

Each model represents a distinct idea about how to understand 3D geometry. Click any card to expand the full explanation — including what to say in a talk or interview.

Dataset: MiniMarket77
Target: Hartley's Strawberry Jam 300g
Scenes: 12,000 × 20,480 pts (XYZ+RGB)
Hardware: Single GPU, PyTorch

Real-World Results

Speed vs. Accuracy in the Real World

Benchmark scores don't tell the whole story. When deployed on ten real-world point cloud scenes captured by eight Intel RealSense D415 cameras, the ranking changes dramatically. PointWeb dominates — fastest and cleanest. The Stratified Transformer, despite its benchmark prestige, fails in practice.

PASS
PointWeb1.91s/scene

Crisp boundaries, compact mask, minimal false positives

PARTIAL
DGCNN5.44s/scene

Finds the right region but leaks onto adjacent bottles

FAIL
Stratified Transformer115.17s/scene

Fragmented mask, scattered false positives, impractical latency

Mean Inference Time (seconds, lower is better)

PointWebDGCNNStrat. Transformer0306090120

Multi-Dimensional Comparison (higher is better)

Val mIoUReal-World QualitySpeedRobustnessSimplicity
  • PointNet
  • PointNet++
  • DGCNN
  • PointWeb
  • Strat. Transformer

Model Compression

Making PointWeb Even Faster

Three compression techniques were applied to the best-performing model. All three preserve near-perfect accuracy while reducing latency. Knowledge distillation is the recommended default — hardware-agnostic and mask-faithful.

Original PointWeb

Latency: 1.91sSpeed-up: ×1.00Val mIoU: 98.6%

Pruning

Latency: 1.81sSpeed-up: ×1.06Val mIoU: 99.2%

INT8 Quantisation

Latency: 1.77sSpeed-up: ×1.08Val mIoU: 99.1%
Fastest

Knowledge Distillation

Latency: 1.78sSpeed-up: ×1.07Val mIoU: 99.0%
Recommended

Latency Comparison (seconds)

Original PointWebPruningINT8 QuantisationKnowledge Distillation1.61.71.81.92

Published Research

This work is now part of the MDPI AI Journal

The dissertation's dataset and methodology were extended and published as a peer-reviewed article in AI, an open-access journal by MDPI. The paper introduces the MiniMarket80 dataset — an expanded version with 80 grocery objects — and benchmarks 11 state-of-the-art point cloud segmentation methods, establishing a new standard for texture-rich, real-world evaluation.

doi.org/10.3390/ai7030096 →
Open Access · MDPI AI

Sorour, M.; Rattray, E.; Syahrulfath, A.; Jaramillo, J.; Lin, S.; Webb, B.

The MiniMarket80 Dataset for Evaluation of Unique Item Segmentation in Point Clouds

AI 2026, 7(3), 96
Received: 24 Jan 2026
Accepted: 25 Feb 2026
Published: 6 Mar 2026

Section: AI in Autonomous Systems
Academic Editor: Miguel Angel Cazorla

Abstract

"The effectiveness of deep learning methods in image segmentation has led to interest in their deployment for 3D point cloud segmentation, particularly in the context of pre-grasp identification of a unique object amongst distractors. However, existing 3D object datasets are not ideal for training and evaluation of these methods. [...] We introduce the MiniMarket80 dataset to address this gap. The dataset consists of 1200 colored point cloud partial views, each of 80 standard grocery objects, collected with widely used Realsense RGB-D cameras (D415 and D435) under variable lighting conditions. [...] We use this dataset to evaluate 11 state-of-the-art point cloud segmentation methods. Only four of these are able to (partially) segment the target object in a real-world test, still producing significant false positives and false negatives."

Three Contributions of MiniMarket80

01

Real-World Data

Adds to the limited pool of real-world point cloud datasets proven crucial for successful deployment — not synthetic CAD models.

02

Texture-Rich Benchmark

Serves as a texture-rich benchmark for testing point cloud segmentation architectures that are otherwise mostly tested on shapes.

03

Low-Cost & Reproducible

Collected using popular Realsense RGB-D sensors. All 80 objects are standard supermarket items — globally reproducible.

Figures from the Published Paper

MiniMarket80 dataset — 80 grocery objects with EAN codes

Figure 1. The MiniMarket80 dataset — 80 standard grocery objects, each identified by its EAN barcode. Objects span beverages, condiments, cereals, personal care, and tinned goods, ensuring diversity in size, shape, texture, and surface reflectance.

Dataset collection setup with 8 RealSense cameras

Figure 2. The data collection rig: 8 Intel RealSense RGB-D cameras (2× D415, 6× D435) arranged around a rotating table with controllable LED lighting. This setup captures 1,200 partial views per object at varying azimuth, elevation, and lighting conditions.

Target and distractor objects

Figure 5. Target vs. distractor. The task is binary: label every point as either the target object (EAN: 5410126116953, Biscoff Smooth) or background. Distractors include visually similar cylindrical containers — a deliberately hard case.

Point cloud resolutions: 1024, 2048, 4096, 8192 points

Figure 3. The same object at four point cloud resolutions: 1,024 · 2,048 · 4,096 · 8,192 points per sample. The dissertation uses 20,480 points per scene (10× the per-object resolution) to represent cluttered multi-object arrangements.

Segmentation sample pairs with ground-truth masks

Figure 4. Segmentation sample pairs at 20,480 points (left) and 81,920 points (right). Red points belong to the target object; blue points are background. The binary mask is the ground truth used to train and evaluate all five models.

Inference results of 11 models on real-world scene

Figure 6. Inference results of all 11 evaluated models on real-world scene 7 (200,715 points). Red = target object; blue = background. Only four models partially succeed: PointNet, DGCNN, PointWeb, and FastPointTransformer. The remaining seven produce entirely incorrect or empty predictions.

Inference results across 10 real-world scenes for 4 models

Figure 7. Inference results of the second experiment across 10 real-world scenes (avg. 225,852 points/scene) for the four partially successful models: PointerNet, PointCNN, PointMLP, and PointNet++. Even the best performers produce significant false positives and false negatives, highlighting the difficulty of real-world unique item segmentation.

Keywords

real object dataset3D point cloud segmentationdata-driven segmentationrobotic graspingunique item segmentationMiniMarket80

Key Finding

Of 11 state-of-the-art models evaluated, only 4 can partially segment a target object in real-world scenes — and all still produce significant false positives and false negatives. This underscores the gap between benchmark performance and real-world deployment, and the importance of texture-rich, real-scan datasets like MiniMarket80.