车辆工程

融合动静特征的自动驾驶场景矢量化表征方法

展开
  • 华南理工大学 机械与汽车工程学院/广东省汽车工程重点实验室,广东 广州 510640

网络出版日期: 2026-01-20

Dynamic-Static Feature Fusion for Autonomous Driving Scenes Vectorization Representation

Expand
  • School of Mechanical and Automotive Engineering/ Guangdong Provincial Key Laboratory of Automotive Engineering, South China University of Technology, Guangzhou 510640, Guangdong, China

Online published: 2026-01-20

摘要

现有自动驾驶场景表征研究倾向于将传感器数据编码为高维特征以保留原始信息,但该方式引入了大量与决策无关的背景噪声,且在融合历史特征提取动态信息时带来了沉重的计算负担。针对上述挑战,本文提出一种名为DSVec(Dynamic-Static Vectorization)的动静态特征融合场景矢量化表征方法。首先,将复杂的交通场景抽象为一系列结构化的动静态矢量元素。在此基础上,设计了一种动静态特征融合网络,利用变分自编码器(Variational Auto-Encoder,VAE)提取语义鸟瞰图(Bird’s- Eye-View,BEV)的低维静态特征,结合基于图结构提取的动态障碍物历史轨迹特征,通过时序卷积网络实现异构特征在时空维度的精准对齐。随后,引入带有类别级掩码机制的Transformer解码器,通过注意力机制剔除冗余信息,实现动静态特征的深度融合与各类矢量化元素的独立解耦重构。最后,基于DSVec生成的紧凑矢量化状态空间构建软演员-评论家(Soft Actor-Critic,SAC)深度强化学习决策模型,并在CARLA高保真仿真平台中进行了端到端闭环验证。实验结果表明,该方法在动静态元素重建上具有较高的精度,相较于传统栅格化表征方法,DSVec显著提升了自动驾驶车辆在环岛、无保护路口等复杂动态场景下的环境适应性、决策安全性与计算效率。

本文引用格式

梁伟强, 罗玉涛, 孙艾宁, 等 . 融合动静特征的自动驾驶场景矢量化表征方法[J]. 华南理工大学学报(自然科学版), 0 : 1 . DOI: 10.12141/j.issn.1000-565X.250338

Abstract

Existing research on autonomous driving scene representation tends to encode sensor data into high-dimensional features to retain maximum raw information. However, this approach introduces substantial background noise irrelevant to decision-making and incurs a heavy computational burden, especially when fusing historical features to extract temporal dynamics. To address these challenges, this paper proposes DSVec (Dynamic-Static Vectorization), a novel vectorized scene representation method based on dynamic-static feature fusion. First, complex traffic scenes are abstracted into a series of structured dynamic and static vector elements. On this basis, a dynamic-static feature fusion network is designed, which leverages Variational Auto-Encoders (VAE) to extract low-dimensional static features from semantic Bird's-Eye-View (BEV) maps. These are combined with historical trajectory features of dynamic obstacles extracted via graph-based structures, using Temporal Convolutional Networks (TCNs) to achieve precise spatiotemporal alignment of heterogeneous features. Subsequently, a Transformer decoder augmented with a category-level masking mechanism is introduced to eliminate redundant information via attention mechanisms, realizing both deep fusion of dynamic-static features and independent decoupling reconstruction of various vectorized elements. Finally, a deep reinforcement learning decision-making model based on the Soft Actor-Critic (SAC) algorithm is constructed using the compact vectorized state space generated by DSVec, and systematic validation is conducted through end-to-end closed-loop simulations on the CARLA high-fidelity platform. Experimental results demonstrate that the proposed method achieves high accuracy in reconstructing both dynamic and static elements. Compared with traditional rasterization-based representation methods, DSVec significantly enhances the environmental adaptability, decision-making safety, and computational efficiency of autonomous vehicles in complex dynamic scenarios such as roundabouts and unprotected intersections.

autonomous driving; vectorization scene representation; trajectory predicion

Options
文章导航

/