Journal of South China University of Technology(Natural Science Edition) ›› 2025, Vol. 53 ›› Issue (12): 1-.doi: 10.12141/j.issn.1000-565X.240549

• Intelligent Transportation System •    

Joint Optimization of Traffic Signal Timing and Vehicle Trajectories Using Hierarchical Soft-Actor Critic Reinforcement Learning

MA Yingying  LI Teng1   LIANG Yunyi2   TANG Meng1   

  1. 1. Department of Transportation Engineering, South China University of Technology, Guangzhou, 510640, Guangdong, China;

    2. Business School, University of Shanghai for Science and Technology, Shanghai 200093, China

  • Online:2025-12-25 Published:2025-07-04

Abstract:

This study proposes a joint optimization method for intersection signal timing and vehicle trajectory based on the Soft Actor Critic (SAC) reinforcement learning framework. The model consists of two layers: signal timing optimization and vehicle trajectory optimization. The state space for both layers includes vehicle position, speed, and traffic signal status, while the reward function is a weighted sum of traffic efficiency, safety, and fuel consumption. In the signal timing optimization layer, the action is the duration of the signal phase, and in the vehicle trajectory optimization layer, the action is vehicle acceleration. Each optimization layer has independent value networks and policy networks. The value network outputs the state-action value based on the current state and action, assessing the policy network's performance. The policy network generates the mean and standard  deviation of a Gaussian distribution based on the current state and samples actions from this parameterized Gaussian distribution. The loss function of the policy network includes entropy and temperature coefficients to automatically adjust the breadth and depth of policy exploration, reducing the model's sensitivity to hyperparameter variations. To address the inconsistency in the intervals between signal timing optimization and vehicle trajectory optimization, an asynchronous training algorithm for the signal timing layer and vehicle trajectory optimization layer is designed. Both the value network and the policy network of the same layer are trained simultaneously using backpropagation. The model is trained and evaluated using SUMO, and experimental results indicate that the proposed method reduces vehicle fuel consumption by an average of 24.24%, 5.39%, and 22.23% compared to mathematical programming methods, signal-timing-only optimization methods, and trajectory-only optimization methods, respectively.

Key words: connected and autonomous vehicles, signal-controlled intersections, joint optimization of signal timing and vehicle trajectories, hierarchical reinforcement learning, soft actor critic reinforcement learning