Journal of South China University of Technology(Natural Science Edition) ›› 2025, Vol. 53 ›› Issue (11): 1-.doi: 10.12141/j.issn.1000-565X.240598

• Computer Science & Technology •    

Unpaired Cross-Modal Retrieval Re-ranking Based on Neighbor Information Aggregation

WO Yan  LIANG Zhanyang    

  1. School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006,Guangdong,China


  • Online:2025-11-25 Published:2025-06-03

Abstract:

Re-ranking methods is a post-processing technique and have demonstrated significant effectiveness in cross-modal retrieval tasks. By mining and processing information between the initial ranking list, they effectively improve retrieval accuracy. Currently, mainstream cross-modal retrieval re-ranking methods re-rank the initial list based on paired datasets. However, they lack flexibility, as they cannot be easily plugged into existing systems without modifying the original framework and retraining, making them difficult to transfer to other frameworks. Moreover, they cannot be applied in unpaired scenarios. While significant progress has been made in cross-modal retrieval tasks with large-scale paired datasets, the issue of requiring substantial resources to label such large datasets in practical scenarios is often overlooked. To address these issues, this paper proposes a Neighbor Information Aggregation-based Unpaired Cross-modal Retrieval Re-ranking Method. The method improves retrieval performance by mining and utilizing neighbor information of samples, pushing incorrect answers away from the query input. It searches for local neighbors in the Euclidean neighborhood and global neighbor expressions through collaborative expression to gather neighbor information, and integrates these two types of neighbor information to generate new features for re-calculating semantic similarity with the retrieval input to complete the re-ranking process. This method is applied as a post-processing technique in several cross-modal retrieval model frameworks and is tested on the MSCOCO dataset, demonstrating the effectiveness of our method and its superiority over other re-ranking methods.

Key words: cross-modal retrieval, re-ranking method, neighbor information aggregation, global semantic neighbors, local semantic neighbors