Traffic Safety

Identifying Key Causes of Accidents for Autonomous Vehicles Based on CTGAN

  • ZHANG Zhiqing ,
  • YU Xiaozheng ,
  • ZHU Leipeng ,
  • SUN Yufeng ,
  • LI Yixin
Expand
  • Beijing Key Laboratory of Traffic Engineering,Beijing University of Technology,Beijing 100124,China
张志清(1965—),男,博士,教授,主要从事道路安全研究。E-mail: zhangzhiqing@bjut.edu.cn

Received date: 2024-07-20

  Online published: 2025-05-06

Supported by

the National Natural Science Foundation of China(52178403)

Abstract

Clarifying the mechanism of traffic accidents involving autonomous vehicles is an important prerequisite for effectively preventing and controlling safety risks. Analysis of accident causation in autonomous vehicles is typically modeled on few-shot and unbalanced data, resulting in low predictive accuracy for under-represented classes. An analytical framework based on data augmentation can improve the prediction accuracy of models for minority classes. The sample size was increased and the dataset was balanced using techniques such as conditional tabular generative adversarial network (CTGAN), Copula generative adversarial network (CopulaGAN), synthetic minority oversampling technique (SMOTE), and adaptive synthetic sampling (ADASYN), and the quality of synthetic data with different methods was compared. Based on the synthetic data, five classification algorithms-logistic regression (LR), decision tree (DT), random forest (RF), extreme gradient boosting (XGB), and support vector machine (SVM)-were evaluated. Metrics such as recall, specificity, weighted F1score, and area under the ROC curve (AUC) were used to determine the optimal combination. Finally, the Shapley additive explanations (SHAP) framework was used to quantify the importance of key contributing factors to accidents. The results show that the marginal distribution score (0.96) and correlation score (0.92) of data generated by CTGAN are the highest, with an average quality of 0.94 for the synthetic data, which is significantly better than other methods. When CTGAN is combined with the random forest algorithm, the model performs excellently in metrics such as recall (0.82), specificity (0.84), and AUC (0.86), and it remains robust in test sets containing 10% label noise (with recall increased to 0.88), further verifying its applicability in complex scenarios. The analysis of key contributing factors indicates that road surface conditions (wet conditions significantly increase the risk of injury), nighttime driving (low light causes reduced sensor performance), and intersection and roadway complexity levels (complex scenarios increase detection delays) are the core factors leading to accidents. This study provides a key basis for the construction of autonomous driving test scenarios and the renovation of road infrastructure.

Cite this article

ZHANG Zhiqing , YU Xiaozheng , ZHU Leipeng , SUN Yufeng , LI Yixin . Identifying Key Causes of Accidents for Autonomous Vehicles Based on CTGAN[J]. Journal of South China University of Technology(Natural Science), 2025 , 53(10) : 14 -28 . DOI: 10.12141/j.issn.1000-565X.240378

References

[1] KUO P F, HSU W T, LORD D,et al .Classification of autonomous vehicle crash severity:solving the pro-blems of imbalanced datasets and small sample size[J].Accident Analysis & Prevention2024205:107666/1-13.
[2] MEASE D, WYNER A J, BUJA A .Boosted classification trees and class probability/quantile estimation[J].Journal of Machine Learning Research20078:409-439.
[3] HE H, GARCIA E A .Learning from imbalanced data[J].IEEE Transactions on Knowledge and Data Engineering200921(9):1263-1284.
[4] HE H, BAI Y, GARCIA E A,et al .ADASYN:adaptive synthetic sampling approach for imbalanced learning[C]∥ Proceeding of 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).Hong Kong:IEEE,2008:1322-1328.
[5] BARUA S, ISLAM M M, YAO X,et al .MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning[J].IEEE Transactions on Knowledge and Data Engineering201226(2):405-425.
[6] TANG B, HE H .KernelADASYN:Kernel based adaptive synthetic data generation for imbalanced learning[C]∥ Proceeding of 2015 IEEE Congress on Evolutionary Computation (CEC).Sendai:IEEE,2015:664-671.
[7] ZHU S. Analysis of the severity of vehicle-bicycle crashes with data mining techniques[J].Journal of Safety Research202076:218-227.
[8] CAI Q, ABDEL-ATY M, YUAN J,et al .Real-time crash prediction on expressways using deep generative models[J].Transportation Research Part C:Emerging Technologies2020117:102697/1-14.
[9] MIRZA M, OSINDERO S .Conditional generative adversarial nets[J].arXiv preprint arXiv:,2014.
[10] RADFORD A, METZ L, CHINTALA S .Unsupervised representation learning with deep convolutional generative adversarial networks[J].arXiv preprint arXiv:,2015.
[11] ARJOVSKY M, CHINTALA S .Bottou. Wasserstein GAN[J].arXiv preprint arXiv:,2017.
[12] ZHOU D, ZHANG H, LI Q,et al .Coutfitgan:learning to synthesize compatible outfits supervised by silhouette masks and fashion styles[J].IEEE Tran-sactions on Multimedia202225(1):4986-5001.
[13] ZHOU D, ZHANG H, YANG K,et al .Learning to synthesize compatible fashion items using semantic alignment and collocation classification:an outfit ge-neration framework[J].IEEE Transactions on Neural Networks and Learning Systems202235(4):5226-5240.
[14] FIORE U, DE SANTIS A, PERLA F,et al .Using generative adversarial networks for improving classification effectiveness in credit card fraud detection[J].Information Sciences2019479:448-455.
[15] ZHANG H, YU X, REN P,et al .Deep adversarial learning in intrusion detection:a data augmentation enhanced framework[J].arXiv preprint arXiv:,2019.
[16] LI Y, YANG Z, XING L .Crash injury severity prediction considering data imbalance:a wasserstein ge-nerative adversarial network with gradient penalty approach[J].Accident Analysis & Prevention2023192:107271/1-18.
[17] ZHOU B, ZHOU Q, LI Z .Addressing data imba-lance in crash data: evaluating generative adversarial network’s efficacy against conventional methods[J].IEEE Access202513:2929-2944.
[18] MUJALLI R O, LóPEZ G, GARACH L .Bayes classifiers for imbalanced traffic accidents datasets[J].Accident Analysis & Prevention201688:37-51.
[19] SAVOLAINEN P T, MANNERING F L, LORD D,et al .The statistical analysis of highway crash-injury severities:a review and assessment of methodological alternatives[J].Accident Analysis & Prevention201143(5):1666-1676.
[20] ALKHEDER S, ALRUKAIBI F, AIASH A .Risk analysis of traffic accidents’severities:an application of three data mining models[J].ISA Transactions2020106:213-220.
[21] WEN X, XIE Y, WU L,et al .Quantifying and comparing the effects of key risk factors on various types of roadway segment crashes with LightGBM and SHAP[J].Accident Analysis & Prevention2021159:106261/1-11.
[22] DONG S, KHATTAK A, ULLAH I,et al .Predicting and analyzing road traffic injury severity using boosting-based ensemble learning models with SHAPley Additive exPlanations[J].International Journal of Environmental Research and Public Health202219(5):2925/1-23.
[23] WANG H, WANG X, HAN J,et al .A recognition method of aggressive driving behavior based on ensemble learning[J].Sensors202222(2):644/1-24.
[24] WU N, SUN J .Fatigue detection of air traffic controllers based on radiotelephony communications and self-adaption quantum genetic algorithm optimization ensemble learning[J].Applied Sciences202212(20):10252.
[25] IMRAN M, MAHMOOD A M, QYSER A A M .An empirical experimental evaluation on imbalanced data sets with varied imbalance ratio[C]∥ Proceeding of International Conference on Computing and Communication Technologies.Chengdu:IEEE,2014:1-7.
[26] XU L, SKOULARIDOU M, CUESTA-INFANTE A,et al .Modeling tabular data using conditional gan[J].Advances in Neural Information Processing Systems2019659:7335-7345.
[27] BOUROU SEL SAER A, VELIVASSAKI T H,et al .A review of tabular data synthesis using gans on an ids dataset[J].Information202112(9):375.
[28] ZHENG O, ABDEL-ATY M, WANG Z,et al .Avoid:autonomous vehicle operation incident dataset across the globe[J].arXiv preprint arXiv:2303.12889,2023.
[29] DAS P, CHANDA K .Bayesian Network based modeling of regional rainfall from multiple local meteorological drivers[J].Journal of Hydrology2020591:125563/1-17.
[30] DING S, ABDEL-ATY M, WANG D,et al .Exploratory analysis of injury severity under different levels of driving automation (SAE Level 2-5) using multi-source data[J].arXiv preprint arXiv:,2023.
[31] LIU P, GUO Y, LIU P,et al .What can we learn from the AV crashes?—an association rule analysis for identifying the contributing risky factors[J].Accident Analysis & Prevention2024199:107492/1-12.
[32] KHAN M Q, LEE S .A comprehensive survey of dri-ving monitoring and assistance systems[J].Sensors201919(11):2574/1-32.
[33] LI J, LI B, TU Z,et al .Light the night:a multi-condition diffusion framework for unpaired low-light Enhancement in Autonomous Driving[C]∥ Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville:IEEE,2024: 15205-15215.
[34] LI X, LIN K Y, MENG M,et al .A survey of ADAS perceptions with development in China[J].IEEE Transactions on Intelligent Transportation Systems202223(9):14188-14203.
Outlines

/