Journal of South China University of Technology(Natural Science Edition) ›› 2023, Vol. 51 ›› Issue (4): 101-114.doi: 10.12141/j.issn.1000-565X.220237

Special Issue: 2023年交通运输工程

• Traffic & Transportation Engineering • Previous Articles     Next Articles

Traffic Data Imputation Based on Self-Supervised Learning

ZHOU Chuhao1 LIN Peiqun1 YAN Mingyue2   

  1. 1.School of Civil Engineering and Transportation,South China University of Technology,Guangzhou 510640,Guangdong,China
    2.Highway Monitoring & Response Center,MOT,Beijing 100088,China
  • Received:2022-04-27 Online:2023-04-25 Published:2022-12-02
  • Contact: 林培群(1980-),男,教授,博士生导师,主要从事交通大数据、智能交通等研究。 E-mail:pqlin@scut.edu.cn
  • About author:周楚昊(1994-),男,博士,主要从事交通大数据、智能交通等研究。E-mail:505192138@qq.com
  • Supported by:
    the National Natural Science Foundation of China(52072130);the Natural Science Foundation of Guangdong Province(2020A1515010349)

Abstract:

In the regional highway network, there are numerous toll stations generating massive amounts of data on a daily basis. However, due to equipment and network issues, there may be delays in data transmission for some stations. In such cases, the transmitted data may not be sufficient to meet the requirements for real-time traffic flow prediction. To achieve real-time traffic data imputation and dynamic traffic flow prediction, this paper firstly proposed a method for data imputation of highway traffic flow data based on self-supervised learning, which adopts time series model based on attention mechanism (Seq2Seq-Att). Then the self-supervised learning method was used to train the model. Finally, the reliability of the method was verified by taking 80 toll stations in the highway network of Guangdong province as an example. The results show that the method in this paper can flexibly capture the missing pattern in traffic data and give a reasonable value according to the internal correlation of the data. This method is generally superior to other methods and has good performance under different missing rates. The overall MAPE is about 17.7% and the WMAPE is 12.8%. In the case of high missing rate, this method has obvious advantages over other methods. The results of traffic volume prediction indicate that the prediction accuracy of traffic flow prediction using the data completed by this method is close to the situation of using complete data.

Key words: data imputation, self-supervised learning, traffic flow prediction, machine learning, highway

CLC Number: