华南理工大学学报(自然科学版)

• 计算机科学与技术 • 上一篇    下一篇

基于邻居信息聚合的无配对跨模态检索重排序

沃焱  梁展扬   

  1. 华南理工大学 计算机科学与工程学院,广东 广州 510006

  • 发布日期:2025-06-03

Unpaired Cross-Modal Retrieval Re-ranking Based on Neighbor Information Aggregation

WO Yan  LIANG Zhanyang    

  1. School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006,Guangdong,China


  • Published:2025-06-03

摘要:

重排序方法作为一种后处理技术,在跨模态检索任务中体现出显著的效果,通过挖掘与处理初始排序列表之间的信息,有效地提高了检索的准确性。目前主流的跨模态检索重排序方法是在数据集有配对的情况下对初始列表实现重排序,但灵活性差,使用时需要对原来的框架进行修改并重新训练,无法灵活地迁移到其他框架上;此外无法在无配对情景下应用,目前跨模态检索依赖大规模配对数据集取得了显著的进展,忽视了实际场景中标注大规模数据集需耗费大量资源的问题。为此,本文提出了一种基于邻居信息聚合的无配对跨模态检索重排序方法,通过挖掘并利用样本的邻居信息使错误的答案远离查询输入,其通过搜索欧氏邻域中的局部邻居以及基于协同表达搜索全局邻居表达样本的邻居信息,并融合两种邻居信息生成新特征与检索输入重新计算语义相似性完成重排序。本文方法置于多种跨模态检索框架后作为后处理方法,并在MSCOCO数据集上进行实验,体现了本方法的有效性与相对其他重排序方法的优越性。

关键词: 跨模态检索, 重排序方法, 邻居信息聚合, 全局语义邻居, 局部语义邻居

Abstract:

Re-ranking methods is a post-processing technique and have demonstrated significant effectiveness in cross-modal retrieval tasks. By mining and processing information between the initial ranking list, they effectively improve retrieval accuracy. Currently, mainstream cross-modal retrieval re-ranking methods re-rank the initial list based on paired datasets. However, they lack flexibility, as they cannot be easily plugged into existing systems without modifying the original framework and retraining, making them difficult to transfer to other frameworks. Moreover, they cannot be applied in unpaired scenarios. While significant progress has been made in cross-modal retrieval tasks with large-scale paired datasets, the issue of requiring substantial resources to label such large datasets in practical scenarios is often overlooked. To address these issues, this paper proposes a Neighbor Information Aggregation-based Unpaired Cross-modal Retrieval Re-ranking Method. The method improves retrieval performance by mining and utilizing neighbor information of samples, pushing incorrect answers away from the query input. It searches for local neighbors in the Euclidean neighborhood and global neighbor expressions through collaborative expression to gather neighbor information, and integrates these two types of neighbor information to generate new features for re-calculating semantic similarity with the retrieval input to complete the re-ranking process. This method is applied as a post-processing technique in several cross-modal retrieval model frameworks and is tested on the MSCOCO dataset, demonstrating the effectiveness of our method and its superiority over other re-ranking methods.

Key words: cross-modal retrieval, re-ranking method, neighbor information aggregation, global semantic neighbors, local semantic neighbors