Journal of South China University of Technology(Natural Science Edition) ›› 2026, Vol. 54 ›› Issue (3): 65-78.doi: 10.12141/j.issn.1000-565X.250092

• Intelligent Transportation System • Previous Articles     Next Articles

CGT-YOLO-Based Algorithm for Small-Target Traffic Sign Recognition

XING Yan1,2, GUO Sihao1, ZHANG Zhen2,3, PAN Xiaodong2,3, AN Dong1,4   

  1. 1.School of Transportation and Geomatics Engineering,Shenyang Jianzhu University,Shenyang 110168,Liaoning,China
    2.National Engineering Research Center for Road Traffic Safety Control Technology,Shenyang 110168,Liaoning,China
    3.Traffic Management Detachment of Shenyang Public Security Bureau,Shenyang 110168,Liaoning,China
    4.Shenyang Cambrian Transportation Technology Co. ,Ltd. ,Shenyang 110168,Liaoning,China
  • Received:2025-04-01 Online:2026-03-25 Published:2025-10-31
  • Contact: 郭思豪(1999 —),男,硕士,主要从事智能交通研究。 E-mail:27129433@qq.com
  • About author:邢岩(1985—),男,博士,教授,主要从事智能交通研究。E-mail: xingyan@sjzu.edu.cn
  • Supported by:
    the Open Project of National Engineering Research Center for Road Traffic Safety Control Technology(2024GCZXKFKT13B)

Abstract:

To address the degradation in recognition accuracy caused by false and missed detections of small target traffic signs, this study proposes a small traffic sign recognition algorithm based on CGT-YOLO. First, a context-aware enhancement module (CAM) is introduced to replace the spatial pyramid pooling fast (SPPF) module in the YOLOv5s network. By employing parallel dilated convolutions with different dilation rates, the CAM enhances multiscale feature representation and contextual information of small traffic signs without reducing spatial resolution. Second, a global attention mechanism (GAM) is inserted after the concatenation operation in the backbone network of YOLOv5s. The GAM extracts features enhanced by the CAM and strengthens global interaction between channel and spatial dimensions through 3D permutation, multi-layer perceptron, and convolutional spatial attention, thereby highlighting the features of small traffic signs and mitigating the negative effects of complex backgrounds and long distances. Finally, a task-specific context (TSC) decoupled head is utilized to separate features for classification and localization tasks. Through the semantic context encoder (SCE) and detail preservation encoder (DPE) modules, the head generates semantically rich low-resolution feature maps for classification and high-resolution feature maps containing boundary information for localization, respectively. This disentangles classification and localization tasks at the feature source, resolving feature conflicts between the two tasks for small target traffic signs. Experimental results on a dataset constructed by integrating TT100K and CCTSDB show that the improved model achieves enhanced performance across all metrics: the missed detection rate and false detection rate are reduced by 12.1 and 11.6 percentage points, respectively, while mAP(0.50∶0.95) increases by 0.026 0. Compared to models such as YOLOv8s, NanoDet-Plus, and RT-DETR-Nano, CGT-YOLO demonstrates superior performance across multiple metrics. While maintaining a high inference speed (72.5 FPS), it effectively reduces false and missed detections, significantly improving the detection accuracy and robustness of small target traffic signs in complex scenarios.

Key words: small target recognition, traffic sign recognition, dilated convolution, attention mechanism, decoupled head

CLC Number: