华南理工大学学报(自然科学版) ›› 2024, Vol. 52 ›› Issue (10): 31-40.doi: 10.12141/j.issn.1000-565X.230503

• 计算机科学与技术 • 上一篇    下一篇

面向自动驾驶的多任务辅助驾驶策略学习方法

罗玉涛(), 薛志成   

  1. 华南理工大学 机械与汽车工程学院/广东省汽车工程重点实验室,广东 广州 510640
  • 收稿日期:2023-08-01 出版日期:2024-10-25 发布日期:2024-01-31
  • 作者简介:罗玉涛(1972—),男,博士,教授,主要从事无人驾驶汽车和新能源汽车研究。E-mail: ctytluo@scut.edu.cn
  • 基金资助:
    工信部制造业高质量发展专项资金资助项目(R-ZH-023-QT-001-20221009-001);广州市科技计划项目(2023B01J0016)

Multi-Task Assisted Driving Policy Learning Method for Autonomous Driving

LUO Yutao(), XUE Zhicheng   

  1. School of Mechanical and Automotive Engineering/ Guangdong Provincial Key Laboratory of Automotive Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China
  • Received:2023-08-01 Online:2024-10-25 Published:2024-01-31
  • About author:罗玉涛(1972—),男,博士,教授,主要从事无人驾驶汽车和新能源汽车研究。E-mail: ctytluo@scut.edu.cn
  • Supported by:
    the Special Fund for High-Quality Development of the Manufacturing Industry of the Ministry of Industry and Information Technology(R-ZH-023-QT-001-20221009-001)

摘要:

随着自动驾驶技术的发展,深度强化学习成为实现高效驾驶策略学习的重要手段。然而,实施自动驾驶面临着复杂多变的交通场景带来的挑战,并且现有的深度强化学习方法存在场景适应能力单一、收敛速度较慢的问题。针对此类问题,为提高自动驾驶车辆的场景适应能力和策略学习效率,文中提出了一种多任务辅助的驾驶策略学习方法。该方法首先基于深度残差网络构建了编码器-多任务解码器模块,将高维驾驶场景压缩为低维表征,并采用语义分割、深度估计和速度预测的多任务辅助学习,以提高低维表征的场景信息丰富程度;然后,以该低维表征作为状态输入,构建基于强化学习的决策网络,并设计多约束奖励函数来引导驾驶策略的学习;最后,在CARLA中进行仿真实验。结果表明:相较于DDPG、TD3等经典方法,文中方法通过多任务的辅助改善了训练进程,学习到更优的驾驶策略;在环岛、路口等多个典型城市驾驶场景中实现了更高的任务成功率和驾驶得分,具备优秀的决策能力和场景适应性。

关键词: 端到端自动驾驶, 强化学习, 多任务学习, 驾驶策略, 决策

Abstract:

With the development of autonomous driving technology, deep reinforcement learning has become an important means to realize the efficient driving policy learning. However, the implementation of autonomous driving is faced with the challenges brought by the complex and changeable traffic scenes, and the existing deep reinforcement learning methods have the problems of single scene adaptation ability and slow convergence speed. To address these issues and to improve the scene adaptability and policy learning efficiency of autonomous vehicles, this paper proposed a multi-task assisted driving policy learning method. Firstly, this method constructed the encoder-multi-task decoder module based on the deep residual network, squeezing high-dimensional driving scenes into low-dimensional representations, and adopted multi-task-assisted learning of semantic segmentation, depth estimation and speed prediction to improve the scene information richness of low-dimensional representations. Then, the low-dimensional representation was used as the state input to build a decision network based on reinforcement learning, and the multi-constraint reward function was designed to guide the learning of driving strategies. Finally, simulation experiments were conducted in CARLA. The experimental results show that, compared to classic methods such as DDPG and TD3, the proposed method improves the training process through multi-task assistance and learns better driving policies. It achieves higher task success rates and driving scores in several typical urban driving scenarios such as roundabouts and intersections, demonstrating excellent decision-making capabilities and scene adaptability.

Key words: end-to-end autonomous driving, reinforcement learning, multi-task learning, driving policy, decision-making

中图分类号: