基于多模态场景记忆与指令提示的目标导航方法

doi:10.12141/j.issn.1000-565X.250152

摘要/Abstract

摘要：

目标导航要求机器人能够根据自然语言指令或目标类别，在工作环境中自动规划路径并准确到达指定目标。现有目标导航方法主要分为端到端学习和基于规划两大类，其中端到端方法虽然能够直接从感知到动作进行学习，但普遍存在泛化能力不足与可解释性差等问题；而基于规划的方法在一定程度上提升了泛化性和可解释性，但仍存在未针对已知环境进行优化、忽略自然语言指令中的提示信息、难以实现对目标指定距离的精确停靠以及执行效率较低等局限。针对上述问题，本文提出了一种基于多模态场景记忆与指令提示的目标导航方法（MEMO-Nav），旨在提升机器人在已知环境下的目标导航效果。该方法采用分层架构，上层规划层维护多模态场景记忆以记录环境信息，并利用大语言模型解析自然语言指令中的目标与提示信息，进而结合指令信息与场景记忆进行高效的路径点筛选和导航规划；底层执行层则负责基础导航功能，完成机器人的定位与移动，并集成目标检测模型与深度相机实现对目标物体的精确定位。规划层与执行层构成完整的目标导航系统，最终实现找到并停靠在目标指定距离的功能。本文在GAZEBO仿真平台和真实环境上开展了多次实验，实验结果表明，在已知环境下本文方法的导航效率、成功率以及停靠的距离精度等指标相较于已有方法均有明显提升。综上，本文所提出的方法为移动机器人在实际场景下实现高效、可解释且精确的目标导航提供了可行的实现方法。

关键词:

移动机器人, 目标导航, 路径规划, 大语言模型, 多模态

Abstract:

Target navigation, which requires a robot to autonomously plan a path and accurately reach a specified goal based on natural language instructions or a target category, is predominantly approached by two classes of methods: end-to-end learning and planning-based strategies. While end-to-end methods offer direct perception-to-action mapping, they often suffer from poor generalization and a lack of interpretability. Conversely, planning-based methods enhance generalization and interpretability but are limited by a failure to optimize for known environments, an inability to leverage semantic hints from language instructions, difficulty in achieving precise docking at a specified distance, and lower execution efficiency. To address these deficiencies, this paper proposes MEMO-Nav, a target navigation method founded on multimodal scene memory and instruction-guided hints to improve performance within familiar environments. Our approach utilizes a hierarchical architecture where a high-level planning layer maintains a multimodal scene memory and employs a Large Language Model (LLM) to parse the target and contextual hints from instructions, enabling efficient waypoint filtering and navigation planning. A low-level execution layer then manages fundamental navigation functions, including localization and movement, while integrating a target detection model with a depth camera for precise object positioning. This integrated system culminates in the ability to locate and dock at a specified distance from the target. Extensive experiments conducted on the GAZEBO simulation platform and in real-world settings demonstrate that our method yields significant improvements in navigation efficiency, success rate, and docking accuracy compared to existing approaches in known environments. In summary, the proposed method offers a feasible, efficient, interpretable, and precise solution for mobile robot target navigation in practical scenarios.

Key words: mobile robot, goal navigation, path planning, large language model, multi modal

董敏, 赖酉城, 毕盛. 基于多模态场景记忆与指令提示的目标导航方法[J]. 华南理工大学学报(自然科学版), doi: 10.12141/j.issn.1000-565X.250152.

DONG Min, LAI Youcheng, BI Sheng. Multimodal Scene Memory and Instruction-Prompted Target Navigation[J]. Journal of South China University of Technology(Natural Science Edition), doi: 10.12141/j.issn.1000-565X.250152.

[1]	涂新辉, 郭聪, 宗宇航. 基于双向文本扩展的信息检索重排方法[J]. 华南理工大学学报(自然科学版), 2025, 53(9): 59-67.
[2]	马晓亮, 高洁, 刘英, 裴庆祺, 赵汝强, 杨邦兴, 邓从健. 基于意图理解驱动的客服知识推荐大模型构建[J]. 华南理工大学学报(自然科学版), 2025, 53(3): 40-49.
[3]	朱铮宇, 罗超, 贺前华, 等. 基于唇重构与三维耦合CNN的多视角音唇一致性判别[J]. 华南理工大学学报(自然科学版), 2023, 51(5): 70-77.
[4]	姚道金, 殷雄, 罗真, 等. 复杂环境下AGVS路径规划算法[J]. 华南理工大学学报(自然科学版), 2023, 51(11): 56-62.
[5]	温惠英, 元昱青, 林译峰. 考虑道路负载均衡的码头多AGV无冲突路径规划[J]. 华南理工大学学报(自然科学版), 2023, 51(10): 1-10.
[6]	魏武, 韩进, 李艳杰, 等. 基于双树 Quick-RRT* 算法的移动机器人路径规划[J]. 华南理工大学学报（自然科学版）, 2021, 49(7): 51-58.
[7]	温惠英, 林译峰, 吴昊书, 等. 基于城市道路交通环境演变的 ECEA 路径规划算法[J]. 华南理工大学学报（自然科学版）, 2021, 49(10): 1-10.
[8]	张玉建, 罗永峰, 郭小农, 等. 考虑多模态贡献的空间网格结构地震损伤评估方法[J]. 华南理工大学学报（自然科学版）, 2021, 49(10): 59-69.
[9]	张家旭, 杨雄, 施正堂, 等. 汽车紧急换道避障的路径规划与跟踪控制[J]. 华南理工大学学报(自然科学版), 2020, 48(9): 86-93,106.
[10]	赵星吉康林灏徐鹏. 基于多目标路径规划的应急资源配置模型[J]. 华南理工大学学报(自然科学版), 2019, 47(4): 76-82.
[11]	洪晓斌魏新勇黄烨笙刘艳霞肖国权. 融合图像识别和 VFH + 的无人艇局部路径规划方法 [J]. 华南理工大学学报(自然科学版), 2019, 47(10): 24-33.
[12]	潘晓芳周顺平杨林万波. OD 约束的出租车经验模型与路径规划[J]. 华南理工大学学报（自然科学版）, 2017, 45(8): 57-64,83.
[13]	张好剑苏婷婷吴少泓郑军王云宽. 基于改进遗传算法的并联机器人分拣路径优化[J]. 华南理工大学学报（自然科学版）, 2017, 45(10): 93-99.
[14]	吴玉香王超. 一种改进的移动机器人三维路径规划方法[J]. 华南理工大学学报（自然科学版）, 2016, 44(9): 53-60.
[15]	周熙阳杨兆升张伟邴其春商强. 考虑信号交叉口转向类型的最优路径规划算法[J]. 华南理工大学学报（自然科学版）, 2016, 44(4): 101-108.

基于多模态场景记忆与指令提示的目标导航方法

Multimodal Scene Memory and Instruction-Prompted Target Navigation

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价