导航菜单
首页 >  20241108  > 基于Transformer

基于Transformer

Abstract:

In the field of robotic grasping detection, there is still great room for improvement in the processing efficiency of RGB and depth images. This article proposes a novel RGB-D cross modal interactive fusion method for robotic grasping detection based on a Transformer-CNN hybrid architecture. In order to fully utilize the feature information of RGB and depth images, an efficient cross modal feature interaction fusion module has been developed, which can calibrate the corresponding feature information of RGB and depth images and interactively enhance the bimodal features. In addition, a parallel network module between Transformer and CNN is designed to combine the local modeling ability of CNN and the global modeling ability of Transformer to obtain better feature representation and improve the performance of grab detection. The experimental results show that this method achieves an accuracy of 99.1% and 96.2% on the Cornell dataset and Jacquard dataset, respectively. The grasp detection experiments in real scenes verify that the proposed method can effectively predict the grasp pose of objects in various scenarios.

相关推荐: