收稿日期: 2024-07-20
网络出版日期: 2025-05-06
基金资助
国家自然科学基金项目(52178403)
Identifying Key Causes of Accidents for Autonomous Vehicles Based on CTGAN
Received date: 2024-07-20
Online published: 2025-05-06
Supported by
the National Natural Science Foundation of China(52178403)
明晰自动驾驶车辆交通事故机理是有效防控安全风险的重要前提。自动驾驶车辆交通事故诱因分析通常基于小样本和不平衡数据进行建模,但这类模型对于少数类预测精度低。基于数据增强的分析框架可以提高模型对于少数类的预测精度。通过条件表格生成对抗网络(CTGAN)、联合生成对抗网络(CopulaGAN)以及合成少数过采样(SMOTE)、自适应过采样(ADASYN)技术增加样本量,平衡数据集,对比不同方法的合成数据质量;基于合成数据,对逻辑回归(LR)、决策树(DT)、随机森林(RF)、极端梯度提升(XGB)、支持向量机(SVM)5种分类算法进行评估,采用召回率、特异性、加权F1分数及曲线下面积(AUC)等指标确定最优组合;最后结合沙普利可加解释(SHAP)框架量化事故关键诱因重要度。结果表明:CTGAN生成数据的边际分布得分(0.96)和相关性得分(0.92)最高,合成数据的平均质量为0.94,显著优于其他方法;CTGAN与随机森林算法结合时,模型在召回率(0.82)、特异性(0.84)、AUC(0.86)等指标上均表现优异,在包含10%标签噪声的测试集中仍保持鲁棒性(召回率提升至0.88),进一步验证了其在复杂场景中的适用性。关键诱因分析表明,路面状况(潮湿状态显著增加受伤风险)、夜间行车(低光照导致传感器性能下降)、交叉口及街道化程度(复杂场景增加检测延迟)是导致事故的核心因素。该研究为自动驾驶测试场景搭建及道路基础设施改造提供了关键依据。
关键词: 自动驾驶车辆; 小样本量; 数据不平衡; 条件表格生成对抗网络; 事故预测
张志清 , 于晓正 , 朱雷鹏 , 孙玉凤 , 李祎昕 . 基于CTGAN的自动驾驶车辆交通事故关键诱因识别[J]. 华南理工大学学报(自然科学版), 2025 , 53(10) : 14 -28 . DOI: 10.12141/j.issn.1000-565X.240378
Clarifying the mechanism of traffic accidents involving autonomous vehicles is an important prerequisite for effectively preventing and controlling safety risks. Analysis of accident causation in autonomous vehicles is typically modeled on few-shot and unbalanced data, resulting in low predictive accuracy for under-represented classes. An analytical framework based on data augmentation can improve the prediction accuracy of models for minority classes. The sample size was increased and the dataset was balanced using techniques such as conditional tabular generative adversarial network (CTGAN), Copula generative adversarial network (CopulaGAN), synthetic minority oversampling technique (SMOTE), and adaptive synthetic sampling (ADASYN), and the quality of synthetic data with different methods was compared. Based on the synthetic data, five classification algorithms-logistic regression (LR), decision tree (DT), random forest (RF), extreme gradient boosting (XGB), and support vector machine (SVM)-were evaluated. Metrics such as recall, specificity, weighted F1score, and area under the ROC curve (AUC) were used to determine the optimal combination. Finally, the Shapley additive explanations (SHAP) framework was used to quantify the importance of key contributing factors to accidents. The results show that the marginal distribution score (0.96) and correlation score (0.92) of data generated by CTGAN are the highest, with an average quality of 0.94 for the synthetic data, which is significantly better than other methods. When CTGAN is combined with the random forest algorithm, the model performs excellently in metrics such as recall (0.82), specificity (0.84), and AUC (0.86), and it remains robust in test sets containing 10% label noise (with recall increased to 0.88), further verifying its applicability in complex scenarios. The analysis of key contributing factors indicates that road surface conditions (wet conditions significantly increase the risk of injury), nighttime driving (low light causes reduced sensor performance), and intersection and roadway complexity levels (complex scenarios increase detection delays) are the core factors leading to accidents. This study provides a key basis for the construction of autonomous driving test scenarios and the renovation of road infrastructure.
| [1] | KUO P F, HSU W T, LORD D,et al .Classification of autonomous vehicle crash severity:solving the pro-blems of imbalanced datasets and small sample size[J].Accident Analysis & Prevention,2024,205:107666/1-13. |
| [2] | MEASE D, WYNER A J, BUJA A .Boosted classification trees and class probability/quantile estimation[J].Journal of Machine Learning Research,2007,8:409-439. |
| [3] | HE H, GARCIA E A .Learning from imbalanced data[J].IEEE Transactions on Knowledge and Data Engineering,2009,21(9):1263-1284. |
| [4] | HE H, BAI Y, GARCIA E A,et al .ADASYN:adaptive synthetic sampling approach for imbalanced learning[C]∥ Proceeding of 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).Hong Kong:IEEE,2008:1322-1328. |
| [5] | BARUA S, ISLAM M M, YAO X,et al .MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning[J].IEEE Transactions on Knowledge and Data Engineering,2012,26(2):405-425. |
| [6] | TANG B, HE H .KernelADASYN:Kernel based adaptive synthetic data generation for imbalanced learning[C]∥ Proceeding of 2015 IEEE Congress on Evolutionary Computation (CEC).Sendai:IEEE,2015:664-671. |
| [7] | ZHU S. Analysis of the severity of vehicle-bicycle crashes with data mining techniques[J].Journal of Safety Research,2020,76:218-227. |
| [8] | CAI Q, ABDEL-ATY M, YUAN J,et al .Real-time crash prediction on expressways using deep generative models[J].Transportation Research Part C:Emerging Technologies,2020,117:102697/1-14. |
| [9] | MIRZA M, OSINDERO S .Conditional generative adversarial nets[J].arXiv preprint arXiv:,2014. |
| [10] | RADFORD A, METZ L, CHINTALA S .Unsupervised representation learning with deep convolutional generative adversarial networks[J].arXiv preprint arXiv:,2015. |
| [11] | ARJOVSKY M, CHINTALA S .Bottou. Wasserstein GAN[J].arXiv preprint arXiv:,2017. |
| [12] | ZHOU D, ZHANG H, LI Q,et al .Coutfitgan:learning to synthesize compatible outfits supervised by silhouette masks and fashion styles[J].IEEE Tran-sactions on Multimedia,2022,25(1):4986-5001. |
| [13] | ZHOU D, ZHANG H, YANG K,et al .Learning to synthesize compatible fashion items using semantic alignment and collocation classification:an outfit ge-neration framework[J].IEEE Transactions on Neural Networks and Learning Systems,2022,35(4):5226-5240. |
| [14] | FIORE U, DE SANTIS A, PERLA F,et al .Using generative adversarial networks for improving classification effectiveness in credit card fraud detection[J].Information Sciences, 2019,479:448-455. |
| [15] | ZHANG H, YU X, REN P,et al .Deep adversarial learning in intrusion detection:a data augmentation enhanced framework[J].arXiv preprint arXiv:,2019. |
| [16] | LI Y, YANG Z, XING L .Crash injury severity prediction considering data imbalance:a wasserstein ge-nerative adversarial network with gradient penalty approach[J].Accident Analysis & Prevention,2023,192:107271/1-18. |
| [17] | ZHOU B, ZHOU Q, LI Z .Addressing data imba-lance in crash data: evaluating generative adversarial network’s efficacy against conventional methods[J].IEEE Access,2025,13:2929-2944. |
| [18] | MUJALLI R O, LóPEZ G, GARACH L .Bayes classifiers for imbalanced traffic accidents datasets[J].Accident Analysis & Prevention,2016,88:37-51. |
| [19] | SAVOLAINEN P T, MANNERING F L, LORD D,et al .The statistical analysis of highway crash-injury severities:a review and assessment of methodological alternatives[J].Accident Analysis & Prevention,2011,43(5):1666-1676. |
| [20] | ALKHEDER S, ALRUKAIBI F, AIASH A .Risk analysis of traffic accidents’severities:an application of three data mining models[J].ISA Transactions,2020,106:213-220. |
| [21] | WEN X, XIE Y, WU L,et al .Quantifying and comparing the effects of key risk factors on various types of roadway segment crashes with LightGBM and SHAP[J].Accident Analysis & Prevention,2021,159:106261/1-11. |
| [22] | DONG S, KHATTAK A, ULLAH I,et al .Predicting and analyzing road traffic injury severity using boosting-based ensemble learning models with SHAPley Additive exPlanations[J].International Journal of Environmental Research and Public Health,2022,19(5):2925/1-23. |
| [23] | WANG H, WANG X, HAN J,et al .A recognition method of aggressive driving behavior based on ensemble learning[J].Sensors,2022,22(2):644/1-24. |
| [24] | WU N, SUN J .Fatigue detection of air traffic controllers based on radiotelephony communications and self-adaption quantum genetic algorithm optimization ensemble learning[J].Applied Sciences,2022,12(20):10252. |
| [25] | IMRAN M, MAHMOOD A M, QYSER A A M .An empirical experimental evaluation on imbalanced data sets with varied imbalance ratio[C]∥ Proceeding of International Conference on Computing and Communication Technologies.Chengdu:IEEE,2014:1-7. |
| [26] | XU L, SKOULARIDOU M, CUESTA-INFANTE A,et al .Modeling tabular data using conditional gan[J].Advances in Neural Information Processing Systems,2019,659:7335-7345. |
| [27] | BOUROU SEL SAER A, VELIVASSAKI T H,et al .A review of tabular data synthesis using gans on an ids dataset[J].Information,2021,12(9):375. |
| [28] | ZHENG O, ABDEL-ATY M, WANG Z,et al .Avoid:autonomous vehicle operation incident dataset across the globe[J].arXiv preprint arXiv:2303.12889,2023. |
| [29] | DAS P, CHANDA K .Bayesian Network based modeling of regional rainfall from multiple local meteorological drivers[J].Journal of Hydrology,2020,591:125563/1-17. |
| [30] | DING S, ABDEL-ATY M, WANG D,et al .Exploratory analysis of injury severity under different levels of driving automation (SAE Level 2-5) using multi-source data[J].arXiv preprint arXiv:,2023. |
| [31] | LIU P, GUO Y, LIU P,et al .What can we learn from the AV crashes?—an association rule analysis for identifying the contributing risky factors[J].Accident Analysis & Prevention,2024,199:107492/1-12. |
| [32] | KHAN M Q, LEE S .A comprehensive survey of dri-ving monitoring and assistance systems[J].Sensors,2019,19(11):2574/1-32. |
| [33] | LI J, LI B, TU Z,et al .Light the night:a multi-condition diffusion framework for unpaired low-light Enhancement in Autonomous Driving[C]∥ Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville:IEEE,2024: 15205-15215. |
| [34] | LI X, LIN K Y, MENG M,et al .A survey of ADAS perceptions with development in China[J].IEEE Transactions on Intelligent Transportation Systems,2022,23(9):14188-14203. |
/
| 〈 |
|
〉 |