华南理工大学学报(自然科学版) ›› 2023, Vol. 51 ›› Issue (11): 44-55.doi: 10.12141/j.issn.1000-565X.220747

所属专题: 2023年电子、通信与自动控制

• 电子、通信与自动控制 • 上一篇    下一篇

基于改进DDPG的无人驾驶避障跟踪控制

李新凯 虎晓诚 马萍 张宏立   

  1. 新疆大学 电气工程学院,新疆 乌鲁木齐 830017
  • 收稿日期:2022-11-14 出版日期:2023-11-25 发布日期:2023-03-28
  • 作者简介:李新凯(1991-),男,博士,讲师,主要从事智能控制、复杂非线性控制研究。E-mail:lxk@xju. edu. cn
  • 基金资助:
    国家自然科学基金资助项目(62263030);新疆维吾尔自治区自然科学基金青年科学基金资助项目(2022D01C86)

Driverless Obstacle Avoidance and Tracking Control Based on Improved DDPG

LI Xinkai HU Xiaocheng MA Ping ZHANG Hongli   

  1. School of Electrical Engineering,Xinjiang University,Urumqi 830017,Xinjiang,China
  • Received:2022-11-14 Online:2023-11-25 Published:2023-03-28
  • About author:李新凯(1991-),男,博士,讲师,主要从事智能控制、复杂非线性控制研究。E-mail:lxk@xju. edu. cn
  • Supported by:
    the National Natural Science Foundation of China(62263030)

摘要:

无人驾驶汽车在跟踪避障控制过程中,被控对象具有非线性特征且被控参数多变,线性模型及固定的无人驾驶车辆数学模型难以保证车辆在复杂环境下的安全性和稳定性,并且无人驾驶离散化控制过程增加了控制难度。针对此类问题,为提高无人驾驶汽车实时控制跟踪轨迹的精度,同时降低整个控制过程的难度,文中提出了一种基于蒙特卡洛-深度确定性策略梯度(MC-DDPG)的无人驾驶汽车避障跟踪控制算法。该算法基于深度强化学习网络搭建控制系统模型,在策略学习采样过程中采用优秀的训练样本,使用蒙特卡洛方法优化网络训练梯度,对算法的训练样本采取优劣区分,使用优异的样本通过梯度算法寻找最优的网络参数,从而增强网络算法的学习能力,实现无人驾驶汽车的更优连续控制。在计算机模拟环境TORCS中对该算法进行仿真实验,结果表明,应用MC-DDPG算法可以有效地实现无人驾驶汽车的避障跟踪控制,其控制的无人驾驶汽车的跟踪精度及避障效果均优于深度Q网络算法和DDPG算法。

关键词: 无人驾驶, 动态避障, 深度确定性策略梯度, 轨迹跟踪, 梯度优化

Abstract:

In the process of tracking and obstacle avoidance control of driverless vehicles, the controlled object has nonlinear characteristics and variable control parameters. The linear model and the fixed mathematical model of driverless vehicles are difficult to ensure the safety and stability of the vehicle in complex environments, and the driverless discrete control process increases the difficulty of control. To address such problems, in order to improve the accuracy of real-time control tracking trajectory of driverless vehicles, and at the same time reduce the difficulty of the whole control process, the paper proposed a Monte Carlo-depth deterministic policy gradient-based obstacle avoidance tracking control algorithm for driverless vehicles. The algorithm builds a control system model based on a deep reinforcement learning network, and adopts excellent training samples in the strategy learning sampling process. It optimizes the network training gradient with the Monte Carlo method, and makes a distinction between good and bad training samples for the algorithm. The excellent samples are used to find the optimal network parameters through a gradient algorithm, so as to enhance the learning ability of the network algorithm and realize a better and continuous control of the driverless vehicle. Simulation experiments of the control method were carried out in the computer simulation environment TORCS. The results show that the proposed improved DDPG algorithm can be applied to effectively achieve the obstacle avoidance tracking control of the driverless vehicle, and the tracking accuracy and obstacle avoidance effect of the unmanned car under its control is better than that of the deep Q network algorithm and the DDPG algorithm.

Key words: self-driving, dynamic obstacle avoidance, depth deterministic policy gradient, trajectory tracking, gradient optimization

中图分类号: