基于样本互补锚点图构造的缺失多视图聚类算法
Sample Complementary Anchor Graph Learning for Incomplete Multi-View Clustering
School of Mathematics,South China University of Technology,Guangzhou 510460,Guangdong,China
Online published: 2025-07-17
随着多视图数据在现实中广泛应用,如何处理缺失视图下的聚类问题成为机器学习领域的重要挑战。传统锚点图聚类算法依赖完整实例构建锚点图,高缺失率下因锚点不足难以表征数据结构,低缺失率时又无法发挥锚点的优势。针对其中锚点选择受限、权重分配僵化、计算复杂度高的问题,提出如下改进:设计跨视图锚点互补机制,在共有样本与视图特有样本中自适应地选取锚点,解决高缺失率下表征不足问题;建立缺失模式感知的权重模型,依据样本缺失情况调整视图对相似矩阵的贡献度;利用双随机非负矩阵可分解特性,将谱聚类的时间复杂度从样本规模的立方复杂度优化至线性复杂度。基于上述改进,提出一种基于样本互补锚点图构造的缺失多视图聚类算法IMVC-SAC,在5个公开数据集上的实验表明,该算法的聚类性能优于目前主流算法,尤其在高缺失率下仍保持较好的聚类效果,验证了其鲁棒性与有效性。
刘小兰, 徐宇鸿 . 基于样本互补锚点图构造的缺失多视图聚类算法[J]. 华南理工大学学报(自然科学版), 0 : 1 . DOI: 10.12141/j.issn.1000-565X.250145
With the widespread application of multi-view data in real-world scenarios, addressing clustering problems with incomplete views has become a significant challenge in machine learning. Traditional anchor graph clustering methods rely on complete data to build anchor graphs but face two main issues: limited anchor representation under high missing rates and not fully using the strengths of different views under low missing rates. To address three key limitations—restricted anchor selection, fixed weight assignments, and high computational costs, this paper proposes the following solutions: First, a cross-view anchor complementarity mechanism selects anchors from both common and unique samples across views, improving representation accuracy under severe missing conditions. Second, a missing-pattern-aware weighting model automatically adjusts each view’s contribution to the similarity matrix according to sample missing patterns. Third, this paper employs doubly stochastic non-negative matrix factorization to optimize spectral clustering complexity, reducing it from cubic to linear time complexity relative to sample size. Based on these innovations, this paper develops the Incomplete Multi-View Clustering with Sample-Adaptive Complements (IMVC-SAC). Experiments on five standard datasets show that IMVC-SAC outperforms existing methods, especially maintaining strong performance when 70%-90% of data is missing, proving its effectiveness in real-world scenarios.
/
| 〈 |
|
〉 |