华南理工大学学报(自然科学版) ›› 2024, Vol. 52 ›› Issue (5): 10-19.doi: 10.12141/j.issn.1000-565X.230078

• 交通运输工程 • 上一篇    下一篇

考虑代价敏感的高速公路偷逃费行为识别模型

赵建东1,2(), 许慧玲1, 吕行1, 李平安3, 黄诗音3   

  1. 1.北京交通大学 交通运输学院,北京 100044
    2.北京交通大学 综合交通运输大数据应用技术交通运输行业重点实验室,北京 100044
    3.中公华通(北京)科技发展有限公司,北京 100088
  • 收稿日期:2023-03-03 出版日期:2024-05-25 发布日期:2023-06-19
  • 作者简介:赵建东(1975-),男,博士,教授,主要从事交通大数据和智能网联交通等研究。
  • 基金资助:
    国家自然科学基金资助项目(72288101);国家重点研发计划项目(2019YFB1600200)

Recognition Model of Highway Toll Evasion Behavior Considering Cost-Sensitivity

ZHAO Jiandong1,2(), XU Huiling1, LÜ Xing1, LI Pingan3, HUANG Shiyin3   

  1. 1.School of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, China
    2.Key Laboratory of Transport Industry of Big Data Application Technologies for Comprehensive Transport, Beijing Jiaotong University, Beijing 100044, China
    3.TransChina(Beijing) Technology Co. , Ltd. , Beijing 100088, China
  • Received:2023-03-03 Online:2024-05-25 Published:2023-06-19
  • About author:赵建东(1975-),男,博士,教授,主要从事交通大数据和智能网联交通等研究。
  • Supported by:
    the National Natural Science Foundation of China(72288101);the National Key Research and Development Program(2019YFB1600200)

摘要:

为有效提升高速公路车辆偷逃通行费稽查效率,基于电子不停车收费(ETC)数据,结合K最近邻(KNN)和集成学习(Adaboost)算法及代价敏感学习机制,提出一种高速公路车辆偷逃费行为识别模型。针对原始ETC收费流水数据量大且冗余的特点,制定数据离散化和标准化处理规则,修复并规范数据形态后,提取两类逃费特征。通过分析ETC数据集遴选大车小标等7种逃费类型作为主要研究对象。针对逃费数据“高维”特点导致的模型分类效率低问题,通过Pearson与Spearman相关性分析和ReliefF重要性分析选取表现逃费特性的最佳特征子集。针对逃费车辆与正常车辆的类别“不平衡”现象所引发的模型过拟合问题,构建组合分类模型,在Adaboost算法中将KNN作为基分类器,先通过TomekLinks欠采样缓解不同类边界模糊问题,再引入代价敏感学习机制,提高模型对少数类(逃费车)的重视程度来缓解对多数类(正常车)的判别倾向。最后,对比不同分类模型对各类逃费事件的识别效果,验证融合代价敏感学习机制的KNN-Adaboost模型的性能。结果表明,该研究提出的模型识别精确率达0.98,召回率达0.96,F1系数达0.97,Kappa系数达0.95,较其他模型能更好地解决样本类不均衡问题,对少数类样本具有较高识别精度,可为提升高速公路收费稽查效率提供参考。

关键词: 公路运输, 集成学习, 机器学习, 代价敏感, 特征选择

Abstract:

In order to effectively improve the efficiency of highway vehicle toll evasion inspection, based on ETC (Electronic Toll Collection) toll data, this paper proposed a highway vehicle evasion recognition model by combining KNN (K-Nearest Neighbor), adaptive boosting (Adaboost) algorithm and cost-sensitive learning mechanism. Firstly, in view of the large volume and redundancy of the original ETC toll flow data, data discretization and standardization processing rules were developed to repair and standardize the data form, and then two types of toll evasion features were extracted. Secondly, seven types of toll evasion, such as large vehicles with small tags, were selected as the main research objects by analyzing the ETC data set. Thirdly, to address the problem of inefficient model classification due to the “high-dimensional” characteristics of the evasion data, the best subset of features showing the evasion characteristics was selected by Pearson and Spearman correlation analysis and ReliefF importance analysis. Fourthly, to address the model overfitting problem caused by the class “imbalance” between toll evasion vehicles and normal vehicles, KNN was used as the base classifier in the Adaboost algorithm, and the boundary ambiguity of different categories was alleviated through TomekLinks undersampling, then a cost-sensitive learning mechanism was introduced to improve the model’s emphasis on the minority class (toll evasion vehicles) to alleviate the tendency to discriminate the majority class (normal vehicles). Finally, the performance of the KNN-Adaboost model incorporating cost-sensitive learning mechanisms was verified by comparing the recognition effects of different classification models for various types of evasion events. The results show that the precision of the proposed model is 0.98, Recall is 0.96, F1-score is 0.97, and Kappa coefficient is 0.95, indicating that the proposed model can better solve the sample class imbalance problem than other models and has higher recognition accuracy for minority class,and it can be a reference for improving the efficiency of highway toll inspection.

Key words: highway transport, ensemble learning, machine learning, cost-sensitivity, feature selection

中图分类号: