A Robot Grasping Policy Based on Viewpoint Selection Experience Enhancement Algorithm

doi:10.12141/j.issn.1000-565X.210769

Abstract

Abstract:

To solve the problem of the low success rate of robot vision grasping using fixed environment camera in the scene of cluttered and stacked objects, an eye-hand follow-up camera viewpoint selection policy based on deep reinforcement learning is proposed to improve the accuracy and speed of vision-based grasping. Firstly, a Markov decision process model is constructed for robot active vision-based grasping task, then the problem of viewpoint selection is transformed into a problem of solving the viewpoint value function. A deconvolution network with encoder-decoder structure is used to approximate the viewpoint action value function, and the reinforcement learning is carried out based on the deep Q-network framework. Then, to resolve the problem of sparse reward existing in reinforcement learning, a novel viewpoint experience enhancement algorithm is proposed. The different enhancement methods between the successful and failed grasping process are designed respectively. And the reward region can be expanded from a single point to a circular region for improving the convergence speed of the approximation network. The preliminary experiment is deployed on the simulation platform, and the robot model and the grasping environment are simultaneously built in the simulation platform to implement the offline reinforcement learning. In the process, the proposed viewpoint experience enhancement algorithm can effectively improve the sample utilization rate and speed up the convergence of training. Based on the proposed viewpoint experience enhancement algorithm, the viewpoint action value function approximation network can converge within 2 h. To obtain the results from the verification with application, the proposed viewpoint selection policy is applied to the real-world scenes with robot for grasping experiments. The result shows that the viewpoint optimization based on this policy can effectively promote the accuracy and speed of robot grasping. Compared with the general grasping methods, the proposed viewpoint selection policy needs only one viewpoint selection in real-world robot grasping to find the focus region with high grasping success rate. And the method can also promote the processing efficiency of the best viewpoint selection. The grasping success rate in cluttered scenes is increased by 22.8% against the single-view method, and the mean picks per hour can reach 294 units. As whole, it shows that the proposed policy has the capacity of industrial application.

Key words: robot grasping, reinforcement learning, robot vision, viewpoint selection, best viewpoint prediction, active perception approach, experience enhancement

CLC Number:

TP391

WANG Gao, CHEN Xiaohong, LIU Ning, et al. A Robot Grasping Policy Based on Viewpoint Selection Experience Enhancement Algorithm[J]. Journal of South China University of Technology(Natural Science Edition), 2022, 50(9): 126-137.

Figures/Tables 17

Fig.1

Fig.2

Fig.3

Fig.4

Fig.5

Fig.6

Table 1

Table 2

Evaluation metrics of grasping performance"

名称	描述
抓取成功率	在所有测试轮中，成功的抓取次数与抓取尝试的次数的总比值。评价机器人抓取的基本能力
场景清空率	n轮抓取实验中，场景清空的轮次所占比例。机器人抓取任务中设置了任务失败条件，如果达到任务失败条件则认为场景没有清空，评价机器人解决复杂抓取场景的能力
平均抓取时间	进行抓取尝试的平均时间（不考虑成功或失败），单位为ｓ。评价加入视角选择对机器人抓取动作执行时间的影响
每小时平均抓取个数	评价机器人抓取系统的整体效率，计算方法为 $N = 3 ? 600 平均抓取时间 × 成功率$

Table 2

Fig.7

Fig.8

Fig.9

Fig.10

Fig.11

Fig.12

Fig.13

Fig.14

Table 3

References 23

1	DU G， WANG K， LIAN S，et al ．Vision-based robotic grasping from object localization，object pose estimation to grasp estimation for parallel grippers：a review［J］．Artificial Intelligence Review，2021，54：1677-1734
2	LOWE D G ．Object recognition from local scale-invariant features［C］∥ Proceedings of the seventh IEEE International Conference on Computer Vision．Corfu：IEEE，1999：1150-1157.
3	DROST B， ULRICH M， NAVAB N，et al ．Model globally，match locally：efficient and robust 3d object recognition［C］∥ 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition．San Francisco：IEEE，2010：998-1005.
4	HINTERSTOISSER S， HOLZER S， CAGNIART C，et al ．Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes［C］∥ 2011 International Conference on Computer Vision．Barcelona：IEEE，2011：858-865.
5	XIANG Y， SCHMIDT T， NARAYANAN V，et al ．Posecnn：a convolutional neural network for 6d object pose estimation in cluttered scenes［J］．Robotics：Science and Systems，2017，14：233-244
6	WANG C， XU D， ZHU Y，et al ．Densefusion：6d object pose estimation by iterative dense fusion［C］∥ Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition．New York：IEEE，2019：3343-3352.
7	LENZ I， LEE H， SAXENA A ．Deep learning for detecting robotic grasps［J］．The International Journal of Robotics Research，2015，34（4-5）：705-724.
8	PARK D， CHUN S Y ．Classification based grasp detection using spatial transformer network［J/OL］．［2018-03-04］．.
9	MORRISON D， CORKE P， LEITNER J ．Closing the loop for robotic grasping：A real-time，generative grasp synthesis approach［J/OL］.［2018-05-15］．
10	GUALTIERI M， PLATT R ．Viewpoint selection for grasp detection［C］∥ 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems，IROS． Vancouver：IEEE，2017：258-264.
11	TEN P A， GUALTIERI M， SAENKO K，et al ．Grasp pose detection in point clouds［J］．The International Journal of Robotics Research，2017，36（13-14）：1455-1473.
12	MORRISON D， CORKE P， LEITNER J ．Multi-view picking：Next-best-view reaching for improved grasping in clutter［C］∥ 2019 International Conference on Robotics and Automation，ICRA． Singapore：IEEE，2019：8762-8768.
13	ZENG A， SONG S， WELKER S，et al ．Learning synergies between pushing and grasping with self-supervised deep reinforcement learning［C］∥ 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems，IROS． Madrid：IEEE，2018：4238-4245.
14	DENG Y， GUO X， WEI Y，et al ．Deep reinforcement learning for robotic pushing and picking in cluttered environment［C］∥ 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems，IROS． Macau：IEEE，2019：619-626.
15	KALASHNIKOV D， IRPAN A， PASTOR P，et al ．Scalable deep reinforcement learning for vision-based robotic manipulation［C］∥ Conference on Robot Learning．Zurich：PMLR，2018：651-673.
16	MNIH V， KAVUKCUOGLU K， SILVER D，et al ．Human-level control through deep reinforcement learning［J］．Nature，2015，518（7540）：529-533.
17	NOH H， HONG S， HAN B ．Learning deconvolution network for semantic segmentation［C］∥ Proceedings of the IEEE International Conference on Computer Vision．Santiago：IEEE，2015：1520-1528.
18	SCHAUL T， QUAN J， ANTONOGLOU I，et al ．Prioritized experience replay［J/OL］．［2016-02-25］．
19	KROEMER O， NIEKUM S， KONIDARIS G D ．A review of robot learning for manipulation：challenges，representations，and algorithms［J］．Journal of Machine Learning Research，2021，22（30）：1-82.
20	王高，柳宁，叶文生，等．一种视觉智能数控系统的数据融合方法：CN104200469A［P］．2014-12-10.
21	VAN HASSELT H， GUEZ A， SILVER D ．Deep reinforcement learning with double q-learning［C］∥ Proceedings of the AAAI conference on artificial intelligence．Arizona：AAAI，2016.
22	QI C R， SU H， MO K，et al ．Pointnet：deep learning on point sets for 3d classification and segmentation［C］∥ Proceedings of the IEEE conference on computer vision and pattern recognition．Hawaii：IEEE，2017：652-660.
23	CHEN X， YE Z， SUN J，et al ．Transferable active grasping and real embodied dataset［C］∥ 2020 IEEE International Conference on Robotics and Automation，ICRA． Paris：IEEE，2020：3611-3618.

名称	描述
操作系统	Ubuntu 20.04 LTS
处理器	Intel（R） Core（TM） i7-7700K
内存	16 GB
显卡	NVIDIA GeForce GTX 1080， 8 GB

方法	抓取成功率/%	平均抓取时间/s	每小时平均抓取个数
本文算法	82.7	9.8	294
固定单视角	59.9	9.1	269
固定多视角	79.9	11.9	227
主动多视角	80.3	11.3	248

[1]	WANG Fujian, CHENG Huiling, MA Dongfang, et al. Reconstruction of Urban Vehicle Path Chain Based on Deep Inverse Reinforcement Learning [J]. Journal of South China University of Technology(Natural Science Edition), 2023, 51(7): 120-128.
[2]	CHEN Feng, MAO Haobin, CAI Jiling, et al. Multidimensional cross-layer bandwidth prediction for low-latency real-time video [J]. Journal of South China University of Technology(Natural Science Edition), 2023, 51(11): 18-27.
[3]	XU Lunhui, YU Jiaxin, PEI Mingyang, et al. Repositioning Strategy for Ride-Hailing Vehicles Based on Geometric Road Network Structure and Reinforcement Learning [J]. Journal of South China University of Technology(Natural Science Edition), 2023, 51(10): 99-109.
[4]	YAN Junwei HUANG Qi ZHOU Xuan . Energy-saving Optimization Operation of Central Air-conditioning System Based on Double-DQN Algorithm [J]. Journal of South China University of Technology (Natural Science Edition), 2019, 47(1): 135-144.
[5]	Huang Biao Shao Ming Song Lei. Vision Recognition and Framework Extraction of Loquat Branch-Pruning Robot [J]. Journal of South China University of Technology (Natural Science Edition), 2015, 43(2): 114-119,126.
[6]	Xu Yu- bin Chen Jia- mei Ma Lin. Q- Learning- Based Network Selection Strategy for Access Control in WLAN/WIMAX [J]. Journal of South China University of Technology (Natural Science Edition), 2013, 41(8): 41-46,60.
[7]	Hao Chuan-chuan Fang Zhou Li Ping. Efficient Reinforcement-Learning Control Algorithm Using Experience Reuse [J]. Journal of South China University of Technology(Natural Science Edition), 2012, 40(6): 70-75.
[8]	Yu Tao Hu Xi-bing Liu Jing. Multi-Objective Optimal Power Flow Calculation Based on Multi-Step Q（λ） Learning Algorithm [J]. Journal of South China University of Technology (Natural Science Edition), 2010, 38(10): 139-145.
[9]	Bian Jian-yong Xu Jian-min Pei Hai-long . Video Vehicle Tracking Based on Reinforcement Learning [J]. Journal of South China University of Technology (Natural Science Edition), 2008, 36(10): 57-60,66.