Journal of South China University of Technology(Natural Science Edition) ›› 2025, Vol. 53 ›› Issue (10): 14-28.doi: 10.12141/j.issn.1000-565X.240378

• Traffic Safety • Previous Articles     Next Articles

Identifying Key Causes of Accidents For Autonomous Vehicles Based on CTGAN

ZHANG Zhiqing  YU Xiaozheng  ZHU Leipeng  SUN Yufeng  LI Yixin   

  1. Beijing University of Technology,Beijing Key Laboratory of Traffic Engineering,Beijing 100124

  • Online:2025-10-25 Published:2025-05-06

Abstract:

Understanding the traffic accident mechanism of autonomous vehicle is an important prerequisite for effective prevention and control of safety risks. The analysis of the key causes of autonomous vehicle accidents is usually based on small samples and imbalanced data, but such models have low prediction accuracy for minority classes. The analysis framework based on data augmentation can improve the prediction accuracy of the model for the minority class. Conditional GAN for Generating Synthetic Tabular Data(CTGAN), Copula Flows for Synthetic Data Generation(CopulaGAN), the Synthetic Minority Over-sampling Technique (SMOTE) and the Adaptive Synthetic technique (ADASYN) increased the sample size and balanced the data set. Then, the best machine learning classification algorithm is determined based on the synthetic data. Finally, combined with the SHAP framework, the importance of the key causes of accidents is quantified, which can accurately analyze the key causes of autonomous vehicle accidents. The results show that CTGAN can effectively solve the problem of poor classification performance of small sample size and imbalanced dataset. CTGAN combined with random forest classification algorithm for model training can significantly improve the prediction performance of the model for autonomous driving accidents. Road conditions, driving at night, and the degree of intersection and street transformation are the key causes of autonomous vehicle accidents. The research results can provide reference for the construction of autonomous driving vehicle test scenarios and the transformation of active road infrastructure.

Key words: autonomous vehicles, small sample size, data unbalance, CTGAN, accident prediction