华南理工大学学报(自然科学版) ›› 2024, Vol. 52 ›› Issue (6): 1-11.doi: 10.12141/j.issn.1000-565X.230262

• 绿色智慧交通 • 上一篇    下一篇

基于记忆泊车场景的视觉SLAM算法

胡习之() , 崔博非(), 王琴, 刘鸿   

  1. 华南理工大学 机械与汽车工程学院,广东 广州 510640
  • 收稿日期:2023-04-22 出版日期:2024-06-25 发布日期:2023-10-27
  • 通信作者: 崔博非(1998—),男,硕士生,主要从事智能驾驶与新能源汽车研究。 E-mail:klysxc616@163.com
  • 作者简介:胡习之(1963—),男,博士,副教授,主要从事汽车动力学及智能驾驶研究。E-mail: huxizhi@scut.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(51975219)

Visual SLAM Algorithm Based on Memory Parking Scene

HU Xizhi(), CUI Bofei(), WANG Qin, LIU Hong   

  1. School of Mechanical and Automotive Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China
  • Received:2023-04-22 Online:2024-06-25 Published:2023-10-27
  • Contact: 崔博非(1998—),男,硕士生,主要从事智能驾驶与新能源汽车研究。 E-mail:klysxc616@163.com
  • About author:胡习之(1963—),男,博士,副教授,主要从事汽车动力学及智能驾驶研究。E-mail: huxizhi@scut.edu.cn
  • Supported by:
    the National Natural Science Foundation of China(51975219)

摘要:

随着自动驾驶技术的发展,视觉同步建图与定位(SLAM)技术受到越来越多的关注。在记忆泊车场景中,需要对停车场场景建立先验地图,待汽车再次驶入相同的停车场时,使用视觉SLAM进行场景建图与定位。为使SLAM所建地图的鲁棒性更好、精度和效率更高,文中首先使用轻量化的深度学习算法改善传统特征提取算法在不同场景下鲁棒性较差的不足,用深度可分离卷积代替普通卷积结构,从而大大提升了特征提取效率;接着基于ResNet网络改进Patch-NetVLAD算法,并在MSLS数据集上对改进的残差网络和原始VGG网络进行重新训练,使用图像检索进行粗定位,挑选出候选图像帧,再通过精定位求解相机位姿,完成全局初始化的重定位;在此基础上,使用改进后的词袋算法重新训练不同停车场场景下的图像,将所有算法移植到OpenVSLAM架构中完成实际场景的建图与定位。实验结果表明,文中设计的视觉SLAM系统能够完成地上停车场、地下停车场以及室外半封闭园区道路等多场景的建图,平均纵向定位误差为8.42 cm,平均横向定位误差为8.30 cm,均达到工程要求。

关键词: 同步建图与定位, 记忆泊车, 深度学习, 特征提取, 图像检索

Abstract:

With the development of autonomous driving technology, visual simultaneous localization and mapping (SLAM) technology has attracted more and more attention. In the memory parking scene, it is necessary to establish a prior map of the parking lot scene. Thus, when the car enters the same parking lot again, visual SLAM can help to construct and locate the scene. In order to improve the robustness, accuracy and efficiency of the map built by SLAM, first, a lightweight deep learning algorithm is used to improve the poor robustness of the traditional feature extraction algorithms in different scenarios, and the deep separable convolution is adopted to replace the previous common convolution structure, which greatly improves the time efficiency of feature extraction. Next, the Patch-NetVLAD algorithm is improved based on ResNet network, and the improved residual network as well as the original VGG network is retrained on MSLS data set. Then, image retrieval is used for rough positioning, candidate image frames are selected, and camera pose is solved by fine positioning to complete global initialization relocation. On this basis, the improved bag of words algorithm is used to retrain the images in different parking lot scenes, and all the algorithms are transplanted into the OpenVSLAM architecture to complete the mapping and positioning of the actual scene. The experimental results show that the proposed visual SLAM system can complete the construction of many scenes such as aboveground, underground and semienclosed parking lots, with an average longitudinal positioning error of 8.42 cm and an average horizontal positioning error of 8.30 cm, both of which meet the engineering requirements.

Key words: simultaneous localization and mapping, memory parking, deep learning, feature extraction, image retrieval

中图分类号: