Journal of South China University of Technology(Natural Science Edition) ›› 2025, Vol. 53 ›› Issue (3): 1-11.doi: 10.12141/j.issn.1000-565X.240100

• Computer Science & Technology •     Next Articles

Single-Stage Object Detection Algorithm with Enhanced Pillar Feature Encoding

LUO Yutao(), MAO Haojie   

  1. School of Mechanical and Automotive Engineering/ Guangdong Provincial Key Laboratory of Automotive Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China
  • Received:2024-03-05 Online:2025-03-10 Published:2024-04-26
  • Supported by:
    the Special Fund for High-Quality Development of Manufacturing Industry,the Ministry of Industry and Information Technology of China(R-ZH-023-QT-001-20221009-001)

Abstract:

Single-stage point cloud 3-dimensional object detection algorithms based on pillars have gained significant attention and widespread application in the industry due to their high operational efficiency. However, the loss of fine-grained information loss in 3-dimensional features of point clouds caused by pillar-based quantization results in weaker detection capabilities for small objects in sparse point clouds. Although some studies have proposed solutions to this problem, they often come at the cost of either greater detection time or compromised detection accuracy for large targets. For this reason, this paper proposed an enhanced pillar-based point cloud object detection algorithm with enhanced pillar feature encoding. Firstly, a pillar feature encoding network is constructed to combine local and global features of point clouds within pillar cells, enhancing the representation capability of pillar-quantized features. Then, a backbone network that combines 2-dimensional sparse convolutional blocks with a feature fusion network was designed to fuse multi-scale high-level abstract semantic features and low-level fine-grained spatial features, preventing excessive focus on small-size features and thus degrading the detection performance for large targets. Lastly, the model was trained and tested on the KITTI autonomous driving dataset, with experimental results visualized and ablation studies conducted. The results show that, the proposed algorithm, under the medium difficulty level of the KITTI dataset, has an average precision mean of 63.54% across multiple categories, an average orientation similarity mean of 70.72%, and an average detection frame rate of 31.5 f/s. Compared with the PointPillars, TANet, and PiFEnet, the average precision mean of the algorithm proposed in this paper has increased by 2.44, 2.05, and 2.38 percentage points respectively, and the average orientation similarity mean has increased by 4.69, 0.68, and 7.83 percentage points respectively, demonstrating potential for engineering applications in comparisons with similar algorithms.

Key words: intelligent vehicle, 3-dimensional object detection, point cloud, pillar feature encoding

CLC Number: