YOLOv5 Guided 3D Point Cloud Segmentation
POSTER
Abstract
This study designed an end-to-end implementation of a You Only Look Once (YOLOv5) guided point cloud segmentation that can efficiently localize objects from a 3D point cloud data collected from an Intel D-415 RGBD camera. The low-level camera interface of this project was built on the open-source Robotic Operating System (ROS). We used the Yale-CMU-Berkeley objects dataset to train a YOLOv5 model. Under the assumption that the object was stationary and the base location of the robot arm was fixed, we defined the camera's coordinates relative to the world and the captured object's coordinates relative to the camera. As one of the fastest 2D object detection models, YOLOv5 was implemented in Python and used to label and localize each object with a bounding box in 2D. With the given depth data and the location of the robot arm, the localized 2D object was transformed into 3D real-world coordinates via a perspective projection from 2D to 3D. To finalize the image-to-world frame transformation of the segmented point cloud data, we applied reference frame transformation from the local camera coordinate system to the global coordinate system. Post-processing and normal vector removal were performed to clean the noisy point cloud data. Finally, a density-based spatial clustering algorithm (DBSCAN) was applied to cluster the point cloud data. As a result, the cluster with the greatest number of points can be safely assumed to be the object and was then bounded by a cubic volume to represent the segmented object's orientation. Taking advantage of the reduced complexity of this segmentation pipeline, this system will be better suited for wheelchair-mounted robot grasping systems that support individuals with reduced mobility.
Presenters
-
Edward T Sun
Stony Brook University
Authors
-
Edward T Sun
Stony Brook University
-
Benjamin X Wen
Stony Brook University