基于改进DDPG的无人驾驶避障跟踪控制

doi:10.12141/j.issn.1000-565X.220747

摘要/Abstract

摘要：

无人驾驶汽车在跟踪避障控制过程中，被控对象具有非线性特征且被控参数多变，线性模型及固定的无人驾驶车辆数学模型难以保证车辆在复杂环境下的安全性和稳定性，并且无人驾驶离散化控制过程增加了控制难度。针对此类问题，为提高无人驾驶汽车实时控制跟踪轨迹的精度，同时降低整个控制过程的难度，文中提出了一种基于蒙特卡洛-深度确定性策略梯度（MC-DDPG）的无人驾驶汽车避障跟踪控制算法。该算法基于深度强化学习网络搭建控制系统模型，在策略学习采样过程中采用优秀的训练样本，使用蒙特卡洛方法优化网络训练梯度，对算法的训练样本采取优劣区分，使用优异的样本通过梯度算法寻找最优的网络参数，从而增强网络算法的学习能力，实现无人驾驶汽车的更优连续控制。在计算机模拟环境TORCS中对该算法进行仿真实验，结果表明，应用MC-DDPG算法可以有效地实现无人驾驶汽车的避障跟踪控制，其控制的无人驾驶汽车的跟踪精度及避障效果均优于深度Q网络算法和DDPG算法。

关键词: 无人驾驶, 动态避障, 深度确定性策略梯度, 轨迹跟踪, 梯度优化

Abstract:

In the process of tracking and obstacle avoidance control of driverless vehicles, the controlled object has nonlinear characteristics and variable control parameters. The linear model and the fixed mathematical model of driverless vehicles are difficult to ensure the safety and stability of the vehicle in complex environments, and the driverless discrete control process increases the difficulty of control. To address such problems, in order to improve the accuracy of real-time control tracking trajectory of driverless vehicles, and at the same time reduce the difficulty of the whole control process, the paper proposed a Monte Carlo-depth deterministic policy gradient-based obstacle avoidance tracking control algorithm for driverless vehicles. The algorithm builds a control system model based on a deep reinforcement learning network, and adopts excellent training samples in the strategy learning sampling process. It optimizes the network training gradient with the Monte Carlo method, and makes a distinction between good and bad training samples for the algorithm. The excellent samples are used to find the optimal network parameters through a gradient algorithm, so as to enhance the learning ability of the network algorithm and realize a better and continuous control of the driverless vehicle. Simulation experiments of the control method were carried out in the computer simulation environment TORCS. The results show that the proposed improved DDPG algorithm can be applied to effectively achieve the obstacle avoidance tracking control of the driverless vehicle, and the tracking accuracy and obstacle avoidance effect of the unmanned car under its control is better than that of the deep Q network algorithm and the DDPG algorithm.

Key words: self-driving, dynamic obstacle avoidance, depth deterministic policy gradient, trajectory tracking, gradient optimization

中图分类号:

TP273⁺.5

李新凯, 虎晓诚, 马萍, 等. 基于改进DDPG的无人驾驶避障跟踪控制[J]. 华南理工大学学报(自然科学版), 2023, 51(11): 44-55.

LI Xinkai, HU Xiaocheng, MA Ping, et al.. Driverless Obstacle Avoidance and Tracking Control Based on Improved DDPG[J]. Journal of South China University of Technology(Natural Science Edition), 2023, 51(11): 44-55.

图/表 22

表1

图1

图2

图3

表2

图4

图5

图6

图7

图8

图9

表3

图10

图11

表4

表5

图12

图13

图14

图15

图16

表6

参考文献 23

1	JAN B， FARMAN H， KHAN M ．Designing a smart transportation system：an internet of things and big data approach［J］．IEEE Wireless Communications，2019，26（4）：73-79.
2	徐向阳，胡文浩，董红磊．自动驾驶汽车测试场景构建关键技术综述［J］．汽车工程，2021，43（4）：610-619.
	XU Xiangyang， HU Wenhao， DONG Honglei ．Overview of key technologies for autonomous vehicle test scenario construction［J］．Automotive Engineering，2021，43（4）：610-619.
3	熊璐，杨兴，卓桂荣，等．无人驾驶车辆的运动控制发展现状综述［J］．机械工程学报，2020，56（10）：127-143.
	XIONG Lu， YANG Xing， ZHUO Guirong，et al ．Overview on motion control of autonomous vehicles［J］．Journal of Mechanical Engineering，2020，56（10）：127-143.
4	ZHANG X L， ZHANG W X， ZHAO Y Q ．Personalized motion planning and tracking control for autonomous vehicles obstacle avoidance［J］．IEEE Transactions on Vehicular Technology，2022，71（5）：4733-4747.
5	于向军，槐元辉，姚宗伟．工程车辆无人驾驶关键技术［J］．吉林大学学报（工学版），2021，51（4）：1153-1168.
	YU Xiang-jun， KUI Yuan-hui， YAO Zong-wei ．Key technologies in autonomous vehicle for engineering［J］．Journal of Jilin University（Engineering and Technology Edition），2021，51（4）：1153-1168.
6	GRUYER D， MAGNIER V， HAMDI K，et al ．Perception information processing and modeling：critical stages for autonomous driving applications［J］．Annual Reviews in Control，2017，41（10）：323-341.
7	张家旭，杨雄，施正堂，等．汽车紧急换道避障的路径规划与跟踪控制［J］．华南理工大学学报（自然科学版），2020，48（9）：86-93，106.
	ZHANG Jiaxu， YANG Xiong， SHI Zhengtang，et al ．Path planning and tracking control for emergency lane change and obstacle avoidance of vehicles［J］．Journal of South China University of Technology （Natural Science Edition），2020，48（9）：86-93，106.
8	WANG T， JIANG J F， LIN Y T，et al ．Driver model for obstacle avoidance based on CarSim［J］．Transactions of the Chinese Society of Agricultural Engineering，2010，26（5）：159-163.
9	樊晓平，李双艳，陈特放．基于新人工势场函数的机器人动态避障规划［J］．控制理论与应用，2005，22（5）：703-707.
	FAN Xiao-ping， LI Shuang-yan， CHEN Te-fang ．Dynamic obstacle-avoiding path plan for robots based on a new artificial potential field function［J］．Control Theory & Applications，2005，22（5）：703-707.
10	KATSUKI R， TASAKI T， WATANABE T ．Graph search based local path planning with adaptive node sampling［C］∥ Proceedings of 2018 IEEE Intelligent Vehicles Symposium．Changshu：IEEE，2018：2084-2089.
11	WANG Hong-chao， ZHANG Wei-wei， WU Xun-cheng，et al ．A double-layer nonlinear model predictive control based control algorithm for local trajectory planning for automated trucks under uncertain road adhesion coefficient conditions［J］．Frontiers of Information Technology & Electronic Engineering，2020，21（7）：1059-1074.
12	ZONG C G， JI Z J， YU Y，et al ．Research on obstacle avoidance method for mobile robot based on multisensor information fusion［J］．Sensors and Materials，2020，32（4）：1159-1170.
13	YANG Z C， FENG Y T， ZHANG L X，et al ．Obstacle avoidance control of underactuated robot based on neural network feedforward compensation［J］．Measurement & Control Technology，2017，36（11）：89-97.
14	姚强强，田颖，王圣渊，等．基于力驱动的智能汽车路径跟踪控制策略［J］．华南理工大学学报（自然科学版），2022，50（2）：33-41，57.
	YAO Qiangqiang， TIAN Ying， WANG Shengyuan，et al ．Research on path tracking control strategy of intelligent vehicles based on force drive［J］．Journal of South China University of Technology （Natural Science Edition），2022，50（2）：33-41，57.
15	SALLAB A E， ABDOU M， PEROT E，et al ．Deep reinforcement learning framework for autonomous driving［J］．Electronic Imaging，2017，29（19）：70-76.
16	卢笑，竺一薇，阳牡花，等．联合图像与单目深度特征的强化学习端到端自动驾驶决策方法［J］．武汉大学学报（信息科学版），2021，46（12）：1862-1871.
	LU Xiao， ZHU Yiwei， YANG Muhua，et al ．Reinforcement learning based end-to-end autonomous driving decision-making method by combining image and monocular depth features［J］．Geomatics and Information Science of Wuhan University，2021，46（12）：1862-1871.
17	张守武，王恒，陈鹏，等．神经网络在无人驾驶车辆运动控制中的应用综述［J］．工程科学学报，2022，44（2）：235-243.
	ZHANG Shou-wu， WANG Heng， CHEN Peng，et al ．Overview of the application of neural networks in the motion control of unmanned vehicles［J］．Chinese Journal of Engineering，2022，44（2）：235-243.
18	董豪，杨静，李少波，等．基于深度强化学习的机器人运动控制研究进展［J］．控制与决策，2022，37（2）：278-292.
	DONG Hao， YANG Jing， LI Shao-bo，et al ．Research progress of robot motion control based on deep reinforcement learning［J］．Control and Decision，2022，37（2）：278-292.
19	WANG Y P， ZHENG K X， TIAN D X，et al ．Asynchronous supervised learning pre-training methods for reinforcement learning autonomous driving models［J］．Frontiers of Information Technology & Electronic Engineering，2021，22（5）：673-687.
20	吕帅，龚晓宇，张正昊，等．结合进化算法的深度强化学习方法研究综述［J］．计算机学报，2022，45（7）：1478-1499.
	Shuai LÜ， GONG Xiao-yu， ZHANG Zheng-hao，et al ．Survey of deep reinforcement learning methods with evolutionary algorithms［J］．Chinese Journal of Computers，2022，45（7）：1478-1499.
21	张新钰，高洪波，赵建辉，等．基于深度学习的自动驾驶技术综述［J］．清华大学学报（自然科学版），2018，58（4）：438-444.
	ZHANG Xinyu， GAO Hongbo， ZHAO Jianhui，et al ．Overview of deep learning intelligent driving methods ［J］．Journal of Tsinghua University （Science and Technology），2018，58（4）：438-444.
22	陈红名，刘全，闫岩，等．基于经验指导的深度确定性多行动者-评论家算法［J］．计算机研究与发展，2019，56（8）：1708-1720.
	CHEN Hongming， LIU Quan， YAN Yan，et al ．An experience-guided deep deterministic actor-critic algorithm with multi-actor［J］．Journal of Computer Research and Development，2019，56（8）：1708-1720.
23	陈亮，梁宸，张景异，等．Actor-Critic框架下一种基于改进DDPG的多智能体强化学习算法［J］．控制与决策，2021，36（1）：75-82.
	CHEN Liang， LIANG Chen， ZHANG Jing-yi，et al ．A multi-intelligence reinforcement learning algorithm based on improved DDPG in the Actor-Critic framework［J］．Control and Decision，2021，36（1）：75-82.

名称	取值范围	定义
SpeedX	（-∞，+∞）	车辆纵向（车行驶方向）车速
Angle	-π~π rad	汽车行驶方向和道路轴方向的夹角
Track	0～200 m	200 m范围内车辆与道路边缘的距离
TrackPos	（-∞，+∞）	车辆与道路中心线之间的距离，利用道路宽度将其归一化处理，0表示在道路中心线上，1和-1都表示车辆越过道路边缘线
Opponents	0~100 m	100 m范围内障碍物的距离

名称	取值范围	定义
加速	［0，1］	0表示不加速，1表示全加速
转向	［-1，1］	0表示不转向，-1和1分别表示向左、向右最大转向
刹车	［0，1］	0表示不刹车，1表示最大刹车

障碍物数量	Q值
障碍物数量	MC-DDPG	DDPG	DQN
3	12 787	9 895	6 778
5	12 506	9 307	6 289
8	12 035	9 535	6 141
10	12 751	8 944	5 572
15	11 385	8 107	4 226

障碍物数量	角度误差/rad
障碍物数量	MC-DDPG	DDPG	DQN
3	0.001 95	0.002 17	0.002 62
5	0.001 41	0.002 37	0.003 16
8	0.001 43	0.002 16	0.002 21
10	0.001 54	0.002 16	0.003 61
15	0.001 64	0.002 23	0.004 15

障碍物数量	位置误差/m
障碍物数量	MC-DDPG	DDPG	DQN
3	0.145 7	0.201 1	0.316 4
5	0.143 5	0.181 5	0.245 7
8	0.165 0	0.241 8	0.308 5
10	0.187 2	0.175 4	0.351 8
15	0.143 7	0.213 0	0.291 6