交通运输工程

考虑代价敏感的高速公路偷逃费行为识别模型

展开
  • 1.北京交通大学 交通运输学院,北京 100044
    2.北京交通大学 综合交通运输大数据应用技术交通运输行业重点实验室,北京 100044
    3.中公华通(北京)科技发展有限公司,北京 100088
赵建东(1975-),男,博士,教授,主要从事交通大数据和智能网联交通等研究。

收稿日期: 2023-03-03

  网络出版日期: 2023-06-20

基金资助

国家自然科学基金资助项目(72288101);国家重点研发计划项目(2019YFB1600200)

Recognition Model of Highway Toll Evasion Behavior Considering Cost-Sensitivity

Expand
  • 1.School of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, China
    2.Key Laboratory of Transport Industry of Big Data Application Technologies for Comprehensive Transport, Beijing Jiaotong University, Beijing 100044, China
    3.TransChina(Beijing) Technology Co. , Ltd. , Beijing 100088, China
赵建东(1975-),男,博士,教授,主要从事交通大数据和智能网联交通等研究。

Received date: 2023-03-03

  Online published: 2023-06-20

Supported by

the National Natural Science Foundation of China(72288101);the National Key Research and Development Program(2019YFB1600200)

摘要

为有效提升高速公路车辆偷逃通行费稽查效率,基于电子不停车收费(ETC)数据,结合K最近邻(KNN)和集成学习(Adaboost)算法及代价敏感学习机制,提出一种高速公路车辆偷逃费行为识别模型。针对原始ETC收费流水数据量大且冗余的特点,制定数据离散化和标准化处理规则,修复并规范数据形态后,提取两类逃费特征。通过分析ETC数据集遴选大车小标等7种逃费类型作为主要研究对象。针对逃费数据“高维”特点导致的模型分类效率低问题,通过Pearson与Spearman相关性分析和ReliefF重要性分析选取表现逃费特性的最佳特征子集。针对逃费车辆与正常车辆的类别“不平衡”现象所引发的模型过拟合问题,构建组合分类模型,在Adaboost算法中将KNN作为基分类器,先通过TomekLinks欠采样缓解不同类边界模糊问题,再引入代价敏感学习机制,提高模型对少数类(逃费车)的重视程度来缓解对多数类(正常车)的判别倾向。最后,对比不同分类模型对各类逃费事件的识别效果,验证融合代价敏感学习机制的KNN-Adaboost模型的性能。结果表明,该研究提出的模型识别精确率达0.98,召回率达0.96,F1系数达0.97,Kappa系数达0.95,较其他模型能更好地解决样本类不均衡问题,对少数类样本具有较高识别精度,可为提升高速公路收费稽查效率提供参考。

本文引用格式

赵建东, 许慧玲, 吕行, 等 . 考虑代价敏感的高速公路偷逃费行为识别模型[J]. 华南理工大学学报(自然科学版), 2024 , 52(5) : 10 -19 . DOI: 10.12141/j.issn.1000-565X.230078

Abstract

In order to effectively improve the efficiency of highway vehicle toll evasion inspection, based on ETC (Electronic Toll Collection) toll data, this paper proposed a highway vehicle evasion recognition model by combining KNN (K-Nearest Neighbor), adaptive boosting (Adaboost) algorithm and cost-sensitive learning mechanism. Firstly, in view of the large volume and redundancy of the original ETC toll flow data, data discretization and standardization processing rules were developed to repair and standardize the data form, and then two types of toll evasion features were extracted. Secondly, seven types of toll evasion, such as large vehicles with small tags, were selected as the main research objects by analyzing the ETC data set. Thirdly, to address the problem of inefficient model classification due to the “high-dimensional” characteristics of the evasion data, the best subset of features showing the evasion characteristics was selected by Pearson and Spearman correlation analysis and ReliefF importance analysis. Fourthly, to address the model overfitting problem caused by the class “imbalance” between toll evasion vehicles and normal vehicles, KNN was used as the base classifier in the Adaboost algorithm, and the boundary ambiguity of different categories was alleviated through TomekLinks undersampling, then a cost-sensitive learning mechanism was introduced to improve the model’s emphasis on the minority class (toll evasion vehicles) to alleviate the tendency to discriminate the majority class (normal vehicles). Finally, the performance of the KNN-Adaboost model incorporating cost-sensitive learning mechanisms was verified by comparing the recognition effects of different classification models for various types of evasion events. The results show that the precision of the proposed model is 0.98, Recall is 0.96, F1-score is 0.97, and Kappa coefficient is 0.95, indicating that the proposed model can better solve the sample class imbalance problem than other models and has higher recognition accuracy for minority class,and it can be a reference for improving the efficiency of highway toll inspection.

参考文献

1 陈海亮,吴旭明 .广东省高速公路ETC防逃费判别系统方案探讨[J].中国交通信息化2014(12):62-65.
  CHEN Hailiang, WU Xuming .Discussion on the scheme of ETC anti-evasion discriminating system for Guangdong expressway[J].China ITS Journal2014(12): 62-65.
2 李小运 .数据挖掘技术在高速公路联网收费稽查上的研究与应用[D].广州:华南理工大学,2014
3 马千惠 .基于高速公路大数据的偷逃费车辆发现方法研究[D].北京:北方工业大学,2019
4 杨祥 .基于高速公路多源大数据的双向倒卡车辆在线稽核方法研究[J].中国市政工程2022(3):59-62.
  YANG Xiang .Research on online audit method of two-way change cards vehicles based on multi-source big data of expressway[J].China Municipal Engineering2022(3):59-62.
5 杨阳,李石磊,唐博文 .高速公路套牌车稽查系统技术方法和策略研究[J].汽车与安全2022(3):78-82.
  YANG Yang, LI Shilei, TANG Bowen .Analysis on inspection methods and development trend of highway fake plate vehicles[J].Auto & Safety2022(3):78-82.
6 赵彦,吴淑玲,林志恒,等 .高速公路通行卡逃费行为预测模型研究[J].中国科技论文201510(19):2245-2251.
  ZHAO Yan, WU Shuling, LIN Zhiheng,et al .Study on the prediction model of toll fraud behavior for highway pass card[J].China Sciencepaper201510(19):2245-2251.
7 李松江,周舟,李岩芳,等 .基于IGA-IBP算法的高速公路逃费预测[J].计算机工程与设计201839(12):3840-3845.
  LI Songjiang, ZHOU Zhou, LI Yanfang,et al .Prediction of highway escape cost based on IGA-IBP algorithm[J].Computer Engineering and Design201839(12):3840-3845.
8 申长春 .基于BP神经网络Bagging集成的高速公路绿通车稽查模型研究[D].西安:长安大学,2018
9 向红艳,杨朋涛,伊佳佳 .基于RF-LR的高速公路逃费车辆状态预测模型[J].重庆师范大学学报(自然科学版)202037(1):75-80.
  XIANG Hongyan, YANG Pengtao, YI Jiajia .State prediction model of expressway escaping vehicle based on RF-LR[J].Journal of Chongqing Normal University (Natural Science)202037(1):75-80.
10 杨朋涛 .基于大数据的通行费异常车辆识别算法研究[D].重庆:重庆交通大学,2020
11 刘昱岗,郑帅,徐旭东,等 .基于历史通行数据的假冒绿通车逃费行为预测[J].公路交通科技202138(4):92-102,141.
  LIU Yugang, ZHENG Shuai, XU Xudong,et al .Prediction of fake toll-free vehicle based on historical traffic data[J].Journal of Highway and Transportation Research and Development202138(4):92-102,141.
12 MATEUSZ B, ATSUTO M, MACIEJ A M .A systematic study of the class imbalance problem in convolutional neural networks[J].Neural Networks2018106:249-259.
13 ZULFIQAR A, WASEEM S .Comparative study of discretization methods on the performance of associative classifiers[C]∥Proceedings of 2016 International Conference on Frontiers of Information Technology (FIT).Pakistan:IEEE,2016:87-92.
14 HOU X D, ZHANG T, JI L,et al .Combating highly imbalanced steganalysis with small training samples using feature selection[J].Journal of Visual Communication and Image Representation201749:243-256.
15 ASIM K M, IDRIS A, IQBAL T,et al .Seismic indicators based earthquake predictor system using genetic programming and AdaBoost classification[J].Soil Dynamics and Earthquake Engineering2018(111):1-7.
16 陈琼,谢家亮 .基于自适应采样的不平衡分类方法[J].华南理工大学学报(自然科学版)202250(4):26-34,45.
  CHEN Qiong, XIE Jialiang .An imbalanced classification method based on adaptive sampling[J].Journal of South China University of Technology (Natural Science Edition)202250(4):26-34,45.
17 KANG D,OH S .Balanced training/test set sampling for proper evaluation of classification models[J].Intelligent Data Analysis202024(1):5-18.
18 KARCIO?LU A A, BULUT H .Performance evaluation of classification algorithms using hyperparameter optimization[C]∥Proceedings of 2021 6th International Conference on Computer Science and Engineering (UBMK).Ankara:IEEE,2021:354-358.
文章导航

/