华南理工大学学报(自然科学版)

• 数智交通专题 • 上一篇    下一篇

基于多层异构蒸馏图神经网络的自适应交通信号控制方法

陈昱光1,2 海凌滔1 张顺1 高加尧1 郭凤香1   

  1. 1.昆明理工大学 交通工程学院,云南 昆明 650031;

    2.东南大学 交通学院,江苏省南京市 211189

  • 发布日期:2026-01-23

Adaptive Traffic Signal Control Method Based on Multi-layer Heterogeneous Distillation Diagram Neural Network

CHEN Yuguang1,2 HAI Lingtao1 ZHANG Shun1 GAO Jiayao1 GUO Fengxiang1   

  1. 1. Faculty of Transportation Engineering, Kunming University of Science and Technology,Kunming 650500, Yunnan, China;

    2. School of Transportation, Southeast University, Nanjing 210096, Jiangsu, China

  • Published:2026-01-23

摘要:

深度强化学习(DRL)在自适应交通信号控制(ATSC)中得到了广泛应用,但现有算法不能很好地捕获全面的交叉口状态,也缺乏复杂交通流组成对信号控制效果影响的考虑。本文提出了一种基于知识蒸馏异构图神经网络(KAHGN-Q)的深度学习算法,提取目标交叉口每一进口道和相邻交叉口有影响进口道的交通流信息,得到完整全面的交叉口状态表示。搭建了一种新的图神经网络输入架构,将交通流分为三个层次建模,将节点分为方向级节点、车道类型级节点和车辆类型级节点,耦合宏观交通流与微观车辆组成特征。使用引入动态优先级优先体验回放(PERDP)的D3QN强化学习,不断学习最优选择策略的同时保证所有策略的全覆盖。在奖励设置上对不同的车辆类型采用不同的权重,可实现公交优先。实验结果表明,KAHGN-Q算法在减少车辆平均等待时间、平均延误等方面具有优势。

关键词: 城市交通, 交通信号控制, 深度强化学习, 异构图神经网络, 知识蒸馏

Abstract:

Deep reinforcement learning (DRL) has been widely used in adaptive traffic signal control (ATSC), but the existing algorithms can't capture the overall intersection state well, and also lack the consideration of the influence of complex traffic flow composition on the signal control effect. In this paper, a deep learning algorithm based on KAHGN-Q is proposed, which can extract the traffic flow information of each entrance of the target intersection and the adjacent intersections, and obtain a complete and comprehensive intersection state representation. A new graph neural network input architecture is built, which divides traffic flow into three levels, divides nodes into direction level nodes, lane type nodes and vehicle type nodes, and couples macro traffic flow and micro vehicle composition characteristics. D3QN reinforcement learning with prioritized experience replay incorporating dynamic priorities (PERDP) is employed to continuously learn an optimal action-selection policy while ensuring full coverage of all action strategies. Different weights are used for different vehicle types in the reward setting, which can be used to realize bus priority. The experimental results show that KAHGN-Q algorithm has advantages in reducing the average waiting time and average delay of vehicles.

Key words: urban transportation, traffic signal control, deep reinforcement learning, heterogeneous graph neural network, knowledge distillation