基于排名蒸馏与差异预测的文本语义相似模型

蔡晓东, 谭远浩

doi:10.12141/j.issn.1000-565X.250448

华南理工大学学报(自然科学版) >

0 1

DOI: https://doi.org/10.12141/j.issn.1000-565X.250448

计算机科学与技术

基于排名蒸馏与差异预测的文本语义相似模型

展开

桂林电子科技大学信息与通信学院，广西桂林 541004

网络出版日期: 2026-01-20

收起

Text Semantic Similarity Model Based on Ranking Distillation and Difference Prediction

Expand

School of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, Guangxi, China

Online published: 2026-01-20

Fold

摘要

基于无监督对比学习的文本语义相似性模型中，现有方案多采用简单地将文本划分为正样本和负样本，同时模型学习过程仅围绕文本的整体特征展开。这一设计存在明显局限：一方面忽略了文本之间的细粒度排名特征，难以区分相似程度的梯度差异，另一方面模型在捕捉句间语义变化不敏感，导致无法准确捕捉文本之间的相似性。为了挖掘样本间细粒度关系，并增强模型的语义变化感知能力，本文提出了一种基于排名蒸馏与差异预测的文本语义相似模型。首先，从预训练的教师模型中提取粗粒度排名特征，并将这些信息蒸馏到学生模型中，从而使模型可以捕获到细粒度排名特征。其次，设计一个差异预测辅助网络，先对原始文本进行随机掩码处理得到掩码文本，再由生成器生成重构文本，最后由鉴别器预测哪些原始文本与重构文本的差异，从而使模型能够感知原始文本与掩码文本之间语义变化。实验结果表明，在文本语义相似性任务数据集STS12-STS16、STS-B、SICK-R上，Spearman相关系数相较于先进模型分别在Bert-base、Roberta-base基础之上平均提升1.16%、0.82%，证明了该模型的有效性。

关键词：

深度学习

; 语义相似性; 对比学习; 蒸馏学习

本文引用格式

蔡晓东, 谭远浩 . 基于排名蒸馏与差异预测的文本语义相似模型[J]. 华南理工大学学报(自然科学版), 0 : 1 . DOI: 10.12141/j.issn.1000-565X.250448

Abstract

In text semantic similarity models based on unsupervised contrastive learning, existing approaches often simply divide texts into positive and negative samples, while the model training process only focuses on the overall features of the text. This design has obvious limitations: on one hand, it ignores the fine-grained ranking features between texts, making it difficult to differentiate gradient differences in similarity; on the other hand, the model is insensitive to semantic changes between sentences, resulting in an inability to accurately capture the similarity between texts. To explore fine-grained relationships between samples and enhance the model's ability to perceive semantic changes, this paper proposes a text semantic similarity model based on ranking distillation and difference prediction. First, coarse-grained ranking features are extracted from a pre-trained teacher model and distilled into the student model, enabling it to capture fine-grained ranking features. Second, a difference prediction auxiliary network is designed: the original text is first randomly masked to obtain masked text, then a generator produces reconstructed text, and finally a discriminator predicts the differences between the original text and the reconstructed text, allowing the model to perceive semantic changes between the original and masked texts. Experimental results show that on the text semantic similarity task datasets STS12-STS16, STS-B, and SICK-R, the Spearman correlation coefficient improved on average by 1.16% and 0.82% over the Bert-base and Roberta-base foundations, respectively, compared to advanced models, demonstrating the effectiveness of this model.

Key words： deep learning; semantic similarity; contrastive learning; distillation learning

Options

摘要页面

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract