一种基于视角选择经验增强算法的机器人抓取策略

doi:10.12141/j.issn.1000-565X.210769

华南理工大学学报(自然科学版) ›› 2022, Vol. 50 ›› Issue (9): 126-137.doi: 10.12141/j.issn.1000-565X.210769

所属专题： 2022年机械工程

一种基于视角选择经验增强算法的机器人抓取策略

王高^1,² 陈晓鸿^1,² 柳宁^2,³ 李德平^2,³

^1.暨南大学信息科学技术学院, 广东广州 510632
^2.暨南大学机器人智能技术研究院, 广东广州 510632
^3.暨南大学智能科学与工程学院, 广东珠海 519070

收稿日期:2021-12-06 出版日期:2022-09-25 发布日期:2022-02-11
通信作者: 李德平（1987-），男，博士，讲师，主要从事物体三维位姿估计、机器人抓取、移动机器人研究。 E-mail:lideping@jnu.edu.cn
作者简介:王高（1978-），男，博士，副研究员，主要从事数控与机器视觉、机器人技术研究。E-mail:twangg@jnu.edu.cn
基金资助:
国家自然科学基金面上项目(62172188)

A Robot Grasping Policy Based on Viewpoint Selection Experience Enhancement Algorithm

WANG Gao^1,² CHEN Xiaohong^1,² LIU Ning^2,³ LI Deping^2,³

^1.College of Information Science and Technology，Jinan University，Guangzhou 510632，Guangdong，China
^2.Robotics Intelligence Technology Research Institute，Jinan University，Guangzhou 510632，Guangdong，China
^3.School of Intelligent Systems Science and Engineering，Jinan University，Zhuhai 519070，Guangdong，China

Received:2021-12-06 Online:2022-09-25 Published:2022-02-11
Contact: 李德平（1987-），男，博士，讲师，主要从事物体三维位姿估计、机器人抓取、移动机器人研究。 E-mail:lideping@jnu.edu.cn
About author:王高（1978-），男，博士，副研究员，主要从事数控与机器视觉、机器人技术研究。E-mail:twangg@jnu.edu.cn
Supported by:
the National Natural Science Foundation of China(62172188)

摘要/Abstract

摘要：

针对混杂物体散乱堆叠下的机器人抓取场景，使用固定视角相机的视觉抓取存在成功率低的问题，提出一种基于深度强化学习框架的眼-手随动相机视角选择策略，令机器人能够自主地学习如何选择合适的末端相机位姿，以提高机器人视觉抓取的准确率和速度。首先，面向机器人主动视觉抓取任务建立马尔科夫决策过程模型，将视角选择问题转化为对视角价值函数的求解问题。使用编码解码器结构的反卷积网络近似视角动作价值函数，并基于深度Q网络框架进行强化学习训练。然后，针对训练过程中存在的稀疏奖励问题，提出一种新的视角经验增强算法，分别对抓取成功和抓取失败的过程设计不同的增强方式，将奖励区域从单一点拓展到圆形区域，提高了视角动作价值函数近似网络的收敛速度。先期实验部署在仿真平台中，通过搭建机器人模型及仿真抓取环境实施离线强化学习训练。过程中，使用提出的视角经验增强算法可以有效提高样本利用率，加快训练的收敛速度。基于所提出的视角经验增强算法，视角动作价值函数近似网络在2 h以内可达到收敛。为验证所提视角选择策略的实际应用效果，将视角经验增强算法实施在真实场景下的机器人主动视觉抓取实验中。实验结果表明，采用该策略进行的视角优化有效提高了机器人的抓取准确率和抓取速度。相较其他方法，所提出的视角选择策略在实际机器人抓取中只需进行一次视角选择即可获得抓取成功率高的区域，进一步提高了最佳视角选择的处理效率。相对于单视角方法，混杂场景的抓取成功率提升22.8%，每小时平均抓取个数达到294个，具备了进入工业应用的可行性。

关键词: 机器人抓取, 强化学习, 机器人视觉, 视角选择, 视角预测, 主动感知方法, 经验增强

Abstract:

To solve the problem of the low success rate of robot vision grasping using fixed environment camera in the scene of cluttered and stacked objects, an eye-hand follow-up camera viewpoint selection policy based on deep reinforcement learning is proposed to improve the accuracy and speed of vision-based grasping. Firstly, a Markov decision process model is constructed for robot active vision-based grasping task, then the problem of viewpoint selection is transformed into a problem of solving the viewpoint value function. A deconvolution network with encoder-decoder structure is used to approximate the viewpoint action value function, and the reinforcement learning is carried out based on the deep Q-network framework. Then, to resolve the problem of sparse reward existing in reinforcement learning, a novel viewpoint experience enhancement algorithm is proposed. The different enhancement methods between the successful and failed grasping process are designed respectively. And the reward region can be expanded from a single point to a circular region for improving the convergence speed of the approximation network. The preliminary experiment is deployed on the simulation platform, and the robot model and the grasping environment are simultaneously built in the simulation platform to implement the offline reinforcement learning. In the process, the proposed viewpoint experience enhancement algorithm can effectively improve the sample utilization rate and speed up the convergence of training. Based on the proposed viewpoint experience enhancement algorithm, the viewpoint action value function approximation network can converge within 2 h. To obtain the results from the verification with application, the proposed viewpoint selection policy is applied to the real-world scenes with robot for grasping experiments. The result shows that the viewpoint optimization based on this policy can effectively promote the accuracy and speed of robot grasping. Compared with the general grasping methods, the proposed viewpoint selection policy needs only one viewpoint selection in real-world robot grasping to find the focus region with high grasping success rate. And the method can also promote the processing efficiency of the best viewpoint selection. The grasping success rate in cluttered scenes is increased by 22.8% against the single-view method, and the mean picks per hour can reach 294 units. As whole, it shows that the proposed policy has the capacity of industrial application.

Key words: robot grasping, reinforcement learning, robot vision, viewpoint selection, best viewpoint prediction, active perception approach, experience enhancement

中图分类号:

TP391

王高, 陈晓鸿, 柳宁, 等. 一种基于视角选择经验增强算法的机器人抓取策略[J]. 华南理工大学学报(自然科学版), 2022, 50(9): 126-137.

WANG Gao, CHEN Xiaohong, LIU Ning, et al. A Robot Grasping Policy Based on Viewpoint Selection Experience Enhancement Algorithm[J]. Journal of South China University of Technology(Natural Science Edition), 2022, 50(9): 126-137.

图/表 17

图1

图2

图3

图4

图5

图6

表1

表2

图7

图8

图9

图10

图11

图12

图13

图14

表3

参考文献 23

1	DU G， WANG K， LIAN S，et al ．Vision-based robotic grasping from object localization，object pose estimation to grasp estimation for parallel grippers：a review［J］．Artificial Intelligence Review，2021，54：1677-1734
2	LOWE D G ．Object recognition from local scale-invariant features［C］∥ Proceedings of the seventh IEEE International Conference on Computer Vision．Corfu：IEEE，1999：1150-1157.
3	DROST B， ULRICH M， NAVAB N，et al ．Model globally，match locally：efficient and robust 3d object recognition［C］∥ 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition．San Francisco：IEEE，2010：998-1005.
4	HINTERSTOISSER S， HOLZER S， CAGNIART C，et al ．Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes［C］∥ 2011 International Conference on Computer Vision．Barcelona：IEEE，2011：858-865.
5	XIANG Y， SCHMIDT T， NARAYANAN V，et al ．Posecnn：a convolutional neural network for 6d object pose estimation in cluttered scenes［J］．Robotics：Science and Systems，2017，14：233-244
6	WANG C， XU D， ZHU Y，et al ．Densefusion：6d object pose estimation by iterative dense fusion［C］∥ Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition．New York：IEEE，2019：3343-3352.
7	LENZ I， LEE H， SAXENA A ．Deep learning for detecting robotic grasps［J］．The International Journal of Robotics Research，2015，34（4-5）：705-724.
8	PARK D， CHUN S Y ．Classification based grasp detection using spatial transformer network［J/OL］．［2018-03-04］．.
9	MORRISON D， CORKE P， LEITNER J ．Closing the loop for robotic grasping：A real-time，generative grasp synthesis approach［J/OL］.［2018-05-15］．
10	GUALTIERI M， PLATT R ．Viewpoint selection for grasp detection［C］∥ 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems，IROS． Vancouver：IEEE，2017：258-264.
11	TEN P A， GUALTIERI M， SAENKO K，et al ．Grasp pose detection in point clouds［J］．The International Journal of Robotics Research，2017，36（13-14）：1455-1473.
12	MORRISON D， CORKE P， LEITNER J ．Multi-view picking：Next-best-view reaching for improved grasping in clutter［C］∥ 2019 International Conference on Robotics and Automation，ICRA． Singapore：IEEE，2019：8762-8768.
13	ZENG A， SONG S， WELKER S，et al ．Learning synergies between pushing and grasping with self-supervised deep reinforcement learning［C］∥ 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems，IROS． Madrid：IEEE，2018：4238-4245.
14	DENG Y， GUO X， WEI Y，et al ．Deep reinforcement learning for robotic pushing and picking in cluttered environment［C］∥ 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems，IROS． Macau：IEEE，2019：619-626.
15	KALASHNIKOV D， IRPAN A， PASTOR P，et al ．Scalable deep reinforcement learning for vision-based robotic manipulation［C］∥ Conference on Robot Learning．Zurich：PMLR，2018：651-673.
16	MNIH V， KAVUKCUOGLU K， SILVER D，et al ．Human-level control through deep reinforcement learning［J］．Nature，2015，518（7540）：529-533.
17	NOH H， HONG S， HAN B ．Learning deconvolution network for semantic segmentation［C］∥ Proceedings of the IEEE International Conference on Computer Vision．Santiago：IEEE，2015：1520-1528.
18	SCHAUL T， QUAN J， ANTONOGLOU I，et al ．Prioritized experience replay［J/OL］．［2016-02-25］．
19	KROEMER O， NIEKUM S， KONIDARIS G D ．A review of robot learning for manipulation：challenges，representations，and algorithms［J］．Journal of Machine Learning Research，2021，22（30）：1-82.
20	王高，柳宁，叶文生，等．一种视觉智能数控系统的数据融合方法：CN104200469A［P］．2014-12-10.
21	VAN HASSELT H， GUEZ A， SILVER D ．Deep reinforcement learning with double q-learning［C］∥ Proceedings of the AAAI conference on artificial intelligence．Arizona：AAAI，2016.
22	QI C R， SU H， MO K，et al ．Pointnet：deep learning on point sets for 3d classification and segmentation［C］∥ Proceedings of the IEEE conference on computer vision and pattern recognition．Hawaii：IEEE，2017：652-660.
23	CHEN X， YE Z， SUN J，et al ．Transferable active grasping and real embodied dataset［C］∥ 2020 IEEE International Conference on Robotics and Automation，ICRA． Paris：IEEE，2020：3611-3618.

编辑推荐 0

Metrics

阅读次数

全文

1132

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	21	6	0	1105

来源	本网站	其他网站

次数	944	188
比例	83%	17%

摘要

10038

最新录用	在线预览	正式出版

54	0	9984

来源	本网站	其他网站

次数	1906	8132
比例	19%	81%

名称	描述
操作系统	Ubuntu 20.04 LTS
处理器	Intel（R） Core（TM） i7-7700K
内存	16 GB
显卡	NVIDIA GeForce GTX 1080， 8 GB

名称	描述
抓取成功率	在所有测试轮中，成功的抓取次数与抓取尝试的次数的总比值。评价机器人抓取的基本能力
场景清空率	n轮抓取实验中，场景清空的轮次所占比例。机器人抓取任务中设置了任务失败条件，如果达到任务失败条件则认为场景没有清空，评价机器人解决复杂抓取场景的能力
平均抓取时间	进行抓取尝试的平均时间（不考虑成功或失败），单位为ｓ。评价加入视角选择对机器人抓取动作执行时间的影响
每小时平均抓取个数	评价机器人抓取系统的整体效率，计算方法为 $N = 3 ? 600 平均抓取时间 × 成功率$

方法	抓取成功率/%	平均抓取时间/s	每小时平均抓取个数
本文算法	82.7	9.8	294
固定单视角	59.9	9.1	269
固定多视角	79.9	11.9	227
主动多视角	80.3	11.3	248

[1]	王福建, 程慧玲, 马东方, 等. 基于深度逆向强化学习的城市车辆路径链重构[J]. 华南理工大学学报(自然科学版), 2023, 51(7): 120-128.
[2]	陈锋, 毛豪滨, 蔡吉玲, 等. 面向低延时实时视频的多维跨层带宽预测[J]. 华南理工大学学报(自然科学版), 2023, 51(11): 18-27.
[3]	许伦辉, 余佳芯, 裴明阳, 等. 基于几何路网结构和强化学习的车辆重定位策略[J]. 华南理工大学学报(自然科学版), 2023, 51(10): 99-109.
[4]	闫军威黄琪周璇. 基于Double-DQN的中央空调系统节能优化运行[J]. 华南理工大学学报（自然科学版）, 2019, 47(1): 135-144.
[5]	黄彪邵明宋雷. 枇杷枝条修剪机器人的视觉识别和框架提取[J]. 华南理工大学学报（自然科学版）, 2015, 43(2): 114-119,126.
[6]	邓卓明刘明波. 求解多目标暂态电压紧急控制的强化学习方法[J]. 华南理工大学学报（自然科学版）, 2015, 43(12): 9-17.
[7]	徐玉滨陈佳美马琳. 基于Q学习的WLAN/WIMAX接入控制网络选择策略[J]. 华南理工大学学报（自然科学版）, 2013, 41(8): 41-46,60.
[8]	郝钏钏方舟李平. 采用经验复用的高效强化学习控制方法[J]. 华南理工大学学报(自然科学版), 2012, 40(6): 70-75.
[9]	余涛胡细兵刘靖. 基于多步回溯Q（λ）学习算法的多目标最优潮流计算[J]. 华南理工大学学报（自然科学版）, 2010, 38(10): 139-145.
[10]	卞建勇徐建闽裴海龙 . 基于强化学习的视频车辆跟踪[J]. 华南理工大学学报（自然科学版）, 2008, 36(10): 57-60,66.
[11]	彭志平彭宏. 基于并发Options 的双边多议题协商模型优化[J]. 华南理工大学学报（自然科学版）, 2007, 35(9): 95-100.

一种基于视角选择经验增强算法的机器人抓取策略

A Robot Grasping Policy Based on Viewpoint Selection Experience Enhancement Algorithm

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 17

参考文献 23

相关文章 11

编辑推荐 0

Metrics

本文评价