华南理工大学学报(自然科学版) ›› 2025, Vol. 53 ›› Issue (9): 59-67.doi: 10.12141/j.issn.1000-565X.240499

• 计算机科学与技术 • 上一篇    下一篇

基于双向文本扩展的信息检索重排方法

涂新辉  郭聪  宗宇航   

  1. 华中师范大学 计算机学院,湖北 武汉 430079

  • 出版日期:2025-09-25 发布日期:2025-01-17

Information Retrieval Re-ranking Method Based on Bidirectional Text Expansion

TU Xinhui  GUO Cong  ZONG Yuhang 

  

  1. School of Computer Science, Central China Normal University, Wuhan 430079, Hubei, China
  • Online:2025-09-25 Published:2025-01-17

摘要:

随着大语言模型的快速发展,信息检索中的文本匹配和文本扩展技术均取得了显著进展。其中,查询扩展和文档扩展是信息检索中两种重要的增强文本表征的方法。目前主流的文本扩展方法均利用大语言模型来实现。然而,大语言模型生成的文本和人工创作的文本在语言多样性和风格上有很大区别,这种差异可能会影响查询文档相关性的计算,最终导致整个信息检索过程的准确度下降。为了解决此问题,该文提出一种基于双向文本扩展的信息检索方法(BTE)。首先,采用零样本提示使大语言模型生成文档的伪查询和查询的伪文档;然后,计算伪查询和伪文档之间的语义相似度;最后,把原始查询-文档的相似度得分和伪查询-伪文档的语义相似度加权融合,得到最终的文档排序结果。在两个公开数据集DL19、DL20上的实验证明,BTE方法在NDCG@10、P@10和MRR@10等多个评价指标上均显著优于基准模型。因此,该文提出的双向文本扩展方法能够进一步增强查询与文档之间的相关性匹配,从而对整个信息检索系统性能产生一定程度的提升。

关键词: 信息检索, 大语言模型, 查询扩展, 文档扩展

Abstract: With the rapid development of large language models, text matching and characterization techniques in information retrieval have made significant progress. Among them, query expansion and document expansion are two important methods to enhance text representation in information retrieval. Currently the mainstream text expansion methods are all realized by using large language models. However, texts generated by big language models and manually created texts are very different in terms of linguistic diversity and style, and this difference may affect the calculation of query-document relevance, which ultimately leads to a decrease in the accuracy of the whole information retrieval process. In order to solve this problem, the paper proposes an information retrieval method based on bidirectional text expansion (BTE). First, a zero-sample cue is used to enable the large language model to generate pseudo-queries of documents and pseudo-documents of queries; then, the semantic similarity between pseudo-queries and pseudo-documents is computed; and finally, the original query-document similarity scores and pseudo-queries-pseudo-documents are weighted and fused together to obtain the final document ranking results. Experiments on two publicly available datasets, DL19 and DL20, demonstrate that the BTE method significantly outperforms the benchmark model in several evaluation metrics such as NDCG@10, P@10 and MRR@10. Therefore, the bidirectional text expansion method proposed in this paper can further enhance the relevance matching between the query and the document, thus producing a certain degree of improvement in the performance of the whole information retrieval system.

Key words:

information retrieval, large language model, query expansion, document expansion