华南理工大学学报(自然科学版) ›› 2025, Vol. 53 ›› Issue (3): 1-11.doi: 10.12141/j.issn.1000-565X.240100

• 计算机科学与技术 •    下一篇

基于改进柱形特征编码的单阶段目标检测算法

罗玉涛(), 毛浩杰   

  1. 华南理工大学 机械与汽车工程学院/广东省汽车工程重点实验室,广东 广州 510640
  • 收稿日期:2024-03-05 出版日期:2025-03-10 发布日期:2024-04-26
  • 作者简介:罗玉涛(1972—),男,教授,博士生导师,主要从事无人驾驶汽车和新能源汽车研究。E-mail: ctytluo@scut.edu.cn
  • 基金资助:
    工信部制造业高质量发展专项(R-ZH-023-QT-001-20221009-001);广州市科技计划项目(2023B01J0016)

Single-Stage Object Detection Algorithm with Enhanced Pillar Feature Encoding

LUO Yutao(), MAO Haojie   

  1. School of Mechanical and Automotive Engineering/ Guangdong Provincial Key Laboratory of Automotive Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China
  • Received:2024-03-05 Online:2025-03-10 Published:2024-04-26
  • Supported by:
    the Special Fund for High-Quality Development of Manufacturing Industry,the Ministry of Industry and Information Technology of China(R-ZH-023-QT-001-20221009-001)

摘要:

基于柱形(Pillar)的单阶段点云3维目标检测算法凭借其较高的运行效率,在工业界得到了广泛的关注和应用。但对点云柱形量化造成的点云3维特征细粒度信息损失,导致这类算法对稀疏点云小目标的检测能力较弱。尽管部分研究对此问题提出了应对方法,但通常以较高的检测时间成本或者大目标检测精度作为代价。为此,该文提出了一种基于改进柱形特征编码的柱形点云目标检测算法。首先,构建可实现柱形单元内部点云局部与全局特征相结合的柱形特征编码网络,用于增强柱形量化特征的表征能力;然后,设计一个由2维稀疏卷积块与特征融合网络相结合的主干网络,用于融合多尺度的高级抽象语义特征和低级细粒度空间特征,防止过度关注小尺寸特征而降低大目标的检测性能;最后,在KITTI自动驾驶数据集上进行训练和测试,并对实验结果进行了可视化和消融研究。结果显示:所提算法在KITTI数据集的中等难度下,多个类别的平均精度均值达63.54%、平均方向相似性均值达70.72%,平均检测帧速率达31.5 f/s;与PointPillars、TANet和PiFEnet算法相比,该文算法的平均精度均值分别提高了2.44、2.05和2.38个百分点,平均方向相似性均值分别提高了4.69、0.68和7.83个百分点,在同类算法的对比中表现出工程应用潜力。

关键词: 智能汽车, 3维目标检测, 点云, 柱形特征编码

Abstract:

Single-stage point cloud 3-dimensional object detection algorithms based on pillars have gained significant attention and widespread application in the industry due to their high operational efficiency. However, the loss of fine-grained information loss in 3-dimensional features of point clouds caused by pillar-based quantization results in weaker detection capabilities for small objects in sparse point clouds. Although some studies have proposed solutions to this problem, they often come at the cost of either greater detection time or compromised detection accuracy for large targets. For this reason, this paper proposed an enhanced pillar-based point cloud object detection algorithm with enhanced pillar feature encoding. Firstly, a pillar feature encoding network is constructed to combine local and global features of point clouds within pillar cells, enhancing the representation capability of pillar-quantized features. Then, a backbone network that combines 2-dimensional sparse convolutional blocks with a feature fusion network was designed to fuse multi-scale high-level abstract semantic features and low-level fine-grained spatial features, preventing excessive focus on small-size features and thus degrading the detection performance for large targets. Lastly, the model was trained and tested on the KITTI autonomous driving dataset, with experimental results visualized and ablation studies conducted. The results show that, the proposed algorithm, under the medium difficulty level of the KITTI dataset, has an average precision mean of 63.54% across multiple categories, an average orientation similarity mean of 70.72%, and an average detection frame rate of 31.5 f/s. Compared with the PointPillars, TANet, and PiFEnet, the average precision mean of the algorithm proposed in this paper has increased by 2.44, 2.05, and 2.38 percentage points respectively, and the average orientation similarity mean has increased by 4.69, 0.68, and 7.83 percentage points respectively, demonstrating potential for engineering applications in comparisons with similar algorithms.

Key words: intelligent vehicle, 3-dimensional object detection, point cloud, pillar feature encoding

中图分类号: