华南理工大学学报(自然科学版) ›› 2025, Vol. 53 ›› Issue (11): 18-26.doi: 10.12141/j.issn.1000-565X.240598

• 计算机科学与技术 • 上一篇    下一篇

基于邻居信息聚合的无配对跨模态检索重排序

沃焱, 梁展扬   

  1. 华南理工大学 计算机科学与工程学院,广东 广州 510006
  • 收稿日期:2024-12-25 出版日期:2025-11-25 发布日期:2025-06-03
  • 作者简介:沃焱(1975—),女,博士,教授,主要从事多媒体应用技术研究。E-mail:woyan@scut.edu.cn
  • 基金资助:
    广东省自然科学基金项目(2025A1515011905)

Unpaired Cross-Modal Retrieval Re-Ranking Based on Neighbor Information Aggregation

WO Yan, LIANG Zhanyang   

  1. School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006,Guangdong,China
  • Received:2024-12-25 Online:2025-11-25 Published:2025-06-03
  • About author:沃焱(1975—),女,博士,教授,主要从事多媒体应用技术研究。E-mail:woyan@scut.edu.cn
  • Supported by:
    the Natural Science Foundation of Guangdong Province(2025A1515011905)

摘要:

重排序方法作为一种后处理技术,在跨模态检索任务中展现出了显著的效果,它通过挖掘、处理初始排序列表之间的信息,有效提高了检索的准确性。当前主流的跨模态检索重排序方法是在数据集有配对的情况下对初始列表进行重排序,灵活性差,使用时需对原来的框架进行修改并重新训练,无法灵活地迁移到其他框架上;此外,它们无法应用于无配对情景。依赖于大规模配对数据集,跨模态检索目前取得了显著的进展,但忽视了实际场景中标注大规模数据集需耗费大量资源的问题。鉴于此,该文提出了一种基于邻居信息聚合的无配对跨模态检索重排序方法。该方法通过挖掘并利用样本的邻居信息,使错误的答案远离查询输入;通过搜索欧氏邻域中的局部邻居,并基于协同表达搜索全局邻居表达样本的邻居信息,将这两种邻居信息加以融合生成新特征,再重新计算与检索输入的语义相似性,完成重排序。将该方法置于多种跨模态检索框架作为后处理方法,并在MSCOCO数据集上进行实验,结果证明了该方法的有效性以及相对于其他重排序方法的优越性。

关键词: 跨模态检索, 重排序方法, 邻居信息聚合, 全局语义邻居, 局部语义邻居

Abstract:

As a post-processing technique, re-ranking has demonstrated significant effectiveness in cross-modal retrieval tasks. By mining and processing the information between initial ranking lists, re-ranking process effectively improves retrieval accuracy. The current mainstream cross-modal retrieval re-ranking methods re-rank the initial list based on paired datasets. However, they have poor flexibility because they cannot be easily plugged into existing systems without modifying the original framework and retraining, which makes it difficult to transfer them to other frameworks. Moreover, they cannot be applied in unpaired scenarios. At present, cross-modal retrieval has achieved significant progress by relying on large-scale paired datasets, but it overlooks the problem that labeling such large-scale datasets in practical scenarios requires substantial resources. To address these issues, this paper proposes an unpaired cross-modal retrieval re-ranking method based on neighbor information aggregation. The method improves retrieval performance by mining and utilizing the neighbor information of samples, pushing incorrect answers away from the query input. It searches for local neighbors in the Euclidean neighborhood and for global neighbor expressions through collaborative expression, and then integrates these two types of neighbor information to generate new features for re-calculating semantic similarity with the retrieval input, thus completing a re-ranking process. Finally, the proposed method is applied as a post-processing technique in several cross-modal retrieval model frameworks and is tested on MSCOCO dataset, with its effectiveness and superiority over other re-ranking methods being demonstrated.

Key words: cross-modal retrieval, re-ranking method, neighbor information aggregation, global semantic neighbor, local semantic neighbor

中图分类号: