华南理工大学学报(自然科学版) ›› 2025, Vol. 53 ›› Issue (10): 14-28.doi: 10.12141/j.issn.1000-565X.240378

• 交通安全 • 上一篇    下一篇

基于CTGAN的自动驾驶车辆交通事故关键诱因识别

张志清 于晓正 朱雷鹏 孙玉凤 李祎昕   

  1. 北京工业大学 交通工程北京市重点实验室,北京 100124

  • 出版日期:2025-10-25 发布日期:2025-05-06

Identifying Key Causes of Accidents For Autonomous Vehicles Based on CTGAN

ZHANG Zhiqing  YU Xiaozheng  ZHU Leipeng  SUN Yufeng  LI Yixin   

  1. Beijing University of Technology,Beijing Key Laboratory of Traffic Engineering,Beijing 100124

  • Online:2025-10-25 Published:2025-05-06

摘要:

明晰自动驾驶车辆交通事故机理是有效防控安全风险的重要前提。自动驾驶车辆交通事故诱因分析通常基于小样本和不平衡数据进行建模,但这类模型对于少数类预测精度低。基于数据增强的分析框架可以提高模型对于少数类的预测精度。通过表格数据生成对抗网络(Conditional GAN for Generating Synthetic Tabular Data,CTGAN)、联合生成对抗网络(Copula Flows for Synthetic Data Generation,CopulaGAN)以及合成少数过采样(Synthetic Minority Over-sampling Technique, SMOTE)、自适应过采样(Adaptive Synthetic, ADASYN)技术增加样本量,平衡数据集;再基于合成数据确定最佳的机器学习分类算法;最后结合SHAP框架,量化事故关键诱因的重要度,能够准确分析自动驾驶事故关键诱因。结果表明:CTGAN可以有效解决小样本和不平衡数据集分类性能差的问题;CTGAN与随机森林分类算法结合进行模型训练,能够显著提高模型对自动驾驶事故的预测性能;路面状况、夜间行车以及交叉口和街道化程度是导致自动驾驶车辆事故的关键诱因。研究成果可以为自动驾驶车辆测试场景搭建以及现役道路基础设施改造提供参考依据。

关键词: 自动驾驶, 小样本量, 数据不平衡, CTGAN, 事故预测

Abstract:

Understanding the traffic accident mechanism of autonomous vehicle is an important prerequisite for effective prevention and control of safety risks. The analysis of the key causes of autonomous vehicle accidents is usually based on small samples and imbalanced data, but such models have low prediction accuracy for minority classes. The analysis framework based on data augmentation can improve the prediction accuracy of the model for the minority class. Conditional GAN for Generating Synthetic Tabular Data(CTGAN), Copula Flows for Synthetic Data Generation(CopulaGAN), the Synthetic Minority Over-sampling Technique (SMOTE) and the Adaptive Synthetic technique (ADASYN) increased the sample size and balanced the data set. Then, the best machine learning classification algorithm is determined based on the synthetic data. Finally, combined with the SHAP framework, the importance of the key causes of accidents is quantified, which can accurately analyze the key causes of autonomous vehicle accidents. The results show that CTGAN can effectively solve the problem of poor classification performance of small sample size and imbalanced dataset. CTGAN combined with random forest classification algorithm for model training can significantly improve the prediction performance of the model for autonomous driving accidents. Road conditions, driving at night, and the degree of intersection and street transformation are the key causes of autonomous vehicle accidents. The research results can provide reference for the construction of autonomous driving vehicle test scenarios and the transformation of active road infrastructure.

Key words: autonomous vehicles, small sample size, data unbalance, CTGAN, accident prediction