Journal of South China University of Technology(Natural Science Edition) ›› 2025, Vol. 53 ›› Issue (3): 50-56.doi: 10.12141/j.issn.1000-565X.240159

• Computer Science & Technology • Previous Articles     Next Articles

Contrastive Learning Model Based on Text-Visual and Information Entropy Minimization

CAI Xiaodong1(), DONG Lifang1, HUANG Yeyang1, ZHOU Li2   

  1. 1.School of Information and Communication,Guilin University of Electronic Technology,Guilin 541004,Guangxi,China
    2.Nanning West Bund Fenggu Business Data Co. ,Ltd. ,Nanning 530008,Guangxi,China
  • Received:2024-04-07 Online:2025-03-10 Published:2024-09-13
  • Supported by:
    the Guangxi Innovation-Driven Development Project(AA20302001)

Abstract:

Current unsupervised contrastive learning methods mainly rely on pure textual information to construct sentence embeddings, which presents limitations in comprehensively understanding the deeper meanings conveyed by sentences. Meanwhile, traditional contrastive learning methods focus excessively on maximizing the mutual information between positive instances of text, overlooking the potential noise interference within sentence embeddings. To effectively retain useful information in the text while eliminating noise interference in the embeddings, the paper proposed a contrastive learning model based on text-vision and information entropy minimization. Firstly, the text and the corresponding visual information are deeply fused under the framework of contrastive learning, and jointly mapped to a unified grounding space, ensuring their representations remain consistent within this space. This approach overcomes the limitations of relying solely on pure textual information for sentence embedding learning, making the contrastive learning process more comprehensive and precise. Secondly, following the principle of information minimization, the model reconstructs positive text instances based on information entropy minimization while maximizing mutual information between positive text instances. Experimental results on the standard semantic textual similarity (STS) task demonstrate that the proposed model achieves significant improvements in the Spearman correlation coefficient evaluation metric, showcasing a notable advantage over existing state-of-the-art methods. This also confirms the effectiveness of the proposed model.

Key words: unsupervised contrastive learning, mutual information, text-visual, information entropy minimization, semantic text similarity

CLC Number: