Panoptic-MOPE: Panoptic 3D Mapping and Object Pose Estimation Using Adaptively Weighted Semantic Information.

Abstract

We present a system capable of reconstructing highly detailed object-level models and estimating the 6D pose of objects by means of an RGB-D camera. In this work, we integrate deep-learning-based semantic segmentation, instance segmentation, and 6D object pose estimation into a state of the art RGB-D mapping system. We leverage the pipeline of ElasticFusion as a backbone and propose modifications of the registration cost function to make full use of the semantic class labels in the process. The proposed objective function features tunable weights for the depth, appearance, and semantic information channels, which are learned from data. A fast semantic segmentation and registration weight prediction convolutional neural network (Fast-RGBD-SSWP) suited to efficient computation is introduced. In addition, our approach explores performing 6D object pose estimation from multiple viewpoints supported by the high-quality reconstruction system. The developed method has been verified through experimental validation on the YCB-Video dataset and warehouse dataset. Our results confirm that the proposed system performs favorably in terms of surface reconstruction, segmentation quality, accurate object pose estimation in comparison to other state-of-the-art systems.

Code and more information: https://sites.google.com/view/panoptic-mope