Journal of South China University of Technology(Natural Science Edition) ›› 2025, Vol. 53 ›› Issue (11): 18-26.doi: 10.12141/j.issn.1000-565X.240598

• Computer Science & Technology • Previous Articles     Next Articles

Unpaired Cross-Modal Retrieval Re-Ranking Based on Neighbor Information Aggregation

WO Yan, LIANG Zhanyang   

  1. School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006,Guangdong,China
  • Received:2024-12-25 Online:2025-11-25 Published:2025-06-03
  • About author:沃焱(1975—),女,博士,教授,主要从事多媒体应用技术研究。E-mail:woyan@scut.edu.cn
  • Supported by:
    the Natural Science Foundation of Guangdong Province(2025A1515011905)

Abstract:

As a post-processing technique, re-ranking has demonstrated significant effectiveness in cross-modal retrieval tasks. By mining and processing the information between initial ranking lists, re-ranking process effectively improves retrieval accuracy. The current mainstream cross-modal retrieval re-ranking methods re-rank the initial list based on paired datasets. However, they have poor flexibility because they cannot be easily plugged into existing systems without modifying the original framework and retraining, which makes it difficult to transfer them to other frameworks. Moreover, they cannot be applied in unpaired scenarios. At present, cross-modal retrieval has achieved significant progress by relying on large-scale paired datasets, but it overlooks the problem that labeling such large-scale datasets in practical scenarios requires substantial resources. To address these issues, this paper proposes an unpaired cross-modal retrieval re-ranking method based on neighbor information aggregation. The method improves retrieval performance by mining and utilizing the neighbor information of samples, pushing incorrect answers away from the query input. It searches for local neighbors in the Euclidean neighborhood and for global neighbor expressions through collaborative expression, and then integrates these two types of neighbor information to generate new features for re-calculating semantic similarity with the retrieval input, thus completing a re-ranking process. Finally, the proposed method is applied as a post-processing technique in several cross-modal retrieval model frameworks and is tested on MSCOCO dataset, with its effectiveness and superiority over other re-ranking methods being demonstrated.

Key words: cross-modal retrieval, re-ranking method, neighbor information aggregation, global semantic neighbor, local semantic neighbor

CLC Number: