面向非结构化灾后环境的多智能体协同控制策略研究
1.天津中德应用技术大学 汽车与轨道交通学院,天津 300350
2.北京航空航天大学 交通科学与工程学院,北京 100191
3.中交智运有限公司,天津 300202
网络出版日期: 2026-04-22
Multi-Agent Cooperative Control Strategies for Unstructured Post-Catastrophe Environments
1. School of Automotive and Rail Transportation, Tianjin Sino-German University of Applied Sciences, Tianjin 300350;
2. School of Transportation Science and Engineering, Beihang University, Beijing 100191;
3. CCCC Intelligent Transportation Co., Ltd., Tianjin 300202
Online published: 2026-04-22
自主多智能体集群在灾后紧急搜救任务中发挥着巨大作用,但在废墟这类遮挡严重的非结构化环境中,仍面临单体观测受限与多体间防撞风险的双重制掣。传统的基于深度强化学习方法难以平衡探索效率与安全保障,容易导致灾难性碰撞或过度规避行为。针对上述问题,本文提出了一种融合时空记忆与优化理论的分层安全协同控制模型,将多智能体协同问题解耦为两个层级:基于记忆的高层策略通过引入门控循环单元与集中训练分布执行模式,使智能体能从历史观测序列中提取关键时空特征,以应对废墟环境中的感知不确定性;基于优化的底层安全则利用控制障碍函数来构建实时安全滤波器,并建模为二次规划问题,在不重新训练网络的情况下,将智能体状态约束在安全不变集内,从而为动态非结构化环境提供理论级的安全保证。仿真实验表明,在涵盖10至20个障碍物的不同难度场景中,本模型均表现出较好的鲁棒性,任务成功率保持在87.6%至93.4%之间,较基准提升约17.8%,平均碰撞率降低了41.1%至52.1%,综合平均奖励增长了29.1%至33.1%。实验结果证明了本模型在异构、密集障碍物环境中具有优秀的可扩展性、鲁棒性与适应性,为复杂环境下安全高效的自主协同救援提供了可靠的解决方案。
汪磊, 张恒, 王秀英, 等 . 面向非结构化灾后环境的多智能体协同控制策略研究[J]. 华南理工大学学报(自然科学版), 0 : 1 . DOI: 10.12141/j.issn.1000-565X.260038
Autonomous multi-agent swarms play a vital role in post-disaster emergency search and rescue missions; however, in severely occluded unstructured environments like ruins, they remain restricted by the dual constraints of limited individual field-of-view and inter-agent collision risks. Traditional deep reinforcement learning methods often struggle to balance exploration efficiency with safety, leading to catastrophic collisions or excessive avoidance. To address these issues, this paper proposes a Hierarchical Safety Cooperative Control Framework integrating spatiotemporal memory and optimization theory. This framework decouples multi-agent coordination into two levels: a memory-based high-level policy utilizing gated recurrent units and centralized training with distributed execution to extract key spatiotemporal features from historical observations, mitigating perceptual uncertainty; and an optimization-based low-level safety layer employing control barrier functions. The latter constructs a real-time safety filter formulated as a quadratic programming problem to constrain agent states within a safe invariant set without retraining, ensuring theoretical safety guarantees. Simulation results across scenarios with 10 to 20 obstacles demonstrate robust performance: task success rates range from 87.6% to 93.4%, representing an approximate 17.8% improvement over baselines, while collision rates decrease by 41.1% to 52.1%, and average rewards increase by 29.1% to 33.1%. These findings validate the framework's scalability, robustness, and adaptability in heterogeneous, dense obstacle environments, offering a reliable solution for safe and efficient autonomous cooperative rescue.
/
| 〈 |
|
〉 |