YOLOv5 Guided 3D Point Cloud Segmentation

Edward T Sun; Benjamin X Wen

YOLOv5 Guided 3D Point Cloud Segmentation

POSTER

Abstract

This study designed an end-to-end implementation of a You Only Look Once (YOLOv5) guided point cloud segmentation that can efficiently localize objects from a 3D point cloud data collected from an Intel D-415 RGBD camera. The low-level camera interface of this project was built on the open-source Robotic Operating System (ROS). We used the Yale-CMU-Berkeley objects dataset to train a YOLOv5 model. Under the assumption that the object was stationary and the base location of the robot arm was fixed, we defined the camera's coordinates relative to the world and the captured object's coordinates relative to the camera. As one of the fastest 2D object detection models, YOLOv5 was implemented in Python and used to label and localize each object with a bounding box in 2D. With the given depth data and the location of the robot arm, the localized 2D object was transformed into 3D real-world coordinates via a perspective projection from 2D to 3D. To finalize the image-to-world frame transformation of the segmented point cloud data, we applied reference frame transformation from the local camera coordinate system to the global coordinate system. Post-processing and normal vector removal were performed to clean the noisy point cloud data. Finally, a density-based spatial clustering algorithm (DBSCAN) was applied to cluster the point cloud data. As a result, the cluster with the greatest number of points can be safely assumed to be the object and was then bounded by a cubic volume to represent the segmented object's orientation. Taking advantage of the reduced complexity of this segmentation pipeline, this system will be better suited for wheelchair-mounted robot grasping systems that support individuals with reduced mobility.

Presenters

Edward T Sun

Stony Brook University

Authors

Edward T Sun

Stony Brook University
Benjamin X Wen

Stony Brook University