华南理工大学学报(自然科学版) ›› 2026, Vol. 54 ›› Issue (2): 16-24.doi: 10.12141/j.issn.1000-565X.250145

• 计算机科学与技术 • 上一篇    下一篇

基于样本互补锚点图的缺失多视图聚类算法

刘小兰1(), 徐宇鸿2   

  1. 1.华南理工大学 数学学院,广东 广州 510640
    2.华南理工大学 计算机科学与工程学院,广东 广州 510006
  • 收稿日期:2025-05-19 出版日期:2026-02-25 发布日期:2025-07-18
  • 作者简介:刘小兰(1979—),女,博士,教授,主要从事优化算法与机器学习研究。E-mail: liuxl@scut.edu.cn
  • 基金资助:
    国家社会科学基金项目(21BTJ069);广东省线上线下混合一流课程(粤教高函[2023]33号)

Incomplete Multi-View Clustering Algorithm Based on Sample Complementary Anchor Graph

LIU Xiaolan1(), XU Yuhong2   

  1. 1.School of Mathematics,South China University of Technology,Guangzhou 510460,Guangdong,China
    2.School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006,Guangdong,China
  • Received:2025-05-19 Online:2026-02-25 Published:2025-07-18
  • Supported by:
    the National Social Science Foundation of China(21BTJ069)

摘要:

随着多视图数据在现实场景中得到广泛应用,如何处理缺失视图下的聚类问题已成为机器学习领域的重要挑战。传统锚点图聚类算法依赖完整实例构建锚点图,这导致其在高缺失率下因锚点不足难以表征数据结构,在低缺失率时又无法发挥锚点的优势。针对传统锚点图聚类算法中存在的锚点选择受限、权重分配僵化和计算复杂度高的问题,该文提出了一种基于样本互补锚点图的缺失多视图聚类算法(IMVC-SAC)。该算法首先设计跨视图锚点互补机制,通过在共有样本与视图特有样本中自适应选取锚点,以解决高缺失率下数据结构表征不足的问题;然后建立缺失模式感知的权重模型,依据样本的缺失模式与程度调整视图对相似矩阵的贡献度;最后利用双随机非负矩阵可分解特性,将谱聚类的时间复杂度从样本规模的立方阶复杂度优化至线性阶复杂度。在5个公开数据集上的实验结果表明,该算法的聚类性能优于目前主流算法,尤其在高缺失率下仍能保持较好的聚类效果,验证了其鲁棒性与有效性。

关键词: 缺失多视图聚类, 锚点图, 样本互补, 相似矩阵融合, 谱聚类

Abstract:

With the widespread application of multi-view data in real-world scenarios, clustering with incomplete views has emerged as a significant challenge in machine learning. Traditional anchor graph-based clustering algorithms rely on complete instances to build the anchor graphs. This dependency leads to insufficient anchors for capturing the underlying data structure under high missing rates, while failing to fully leverage the benefits of anchors when missing rate is low. To address the limitations of traditional methods, including restricted anchor selection, inflexible weight assignment, and high computational complexity, this paper proposed an incomplete multi-view clustering algorithm based on a Sample-Complementary Anchor Graphs (IMVC-SAC). First, the algorithm introduces a cross-view anchor complementation mechanism, which adaptively selects anchors from both shared samples and view-specific samples to enhance data structure representation, particularly under high missing rates. Second, it establishes a missing pattern-aware weighting model that dynamically adjusts the contribution of each view to the similarity matrix based on the missing pattern and degree of the samples. Finally, by leveraging the properties of doubly stochastic non-negative matrix factorization, the time complexity of spectral clustering is reduced from cubic to linear with respect to the sample size. Experimental results on five public datasets demonstrate that the proposed IMVC-SAC algorithm outperforms state-of-the-art methods in clustering performance. Notably, it maintains robust and effective clustering even under high missing rates, validating its superiority.

Key words: incomplete multi-view clustering, anchor graph, sample complementarity, similarity matrix fusion, spectral clustering

中图分类号: