华南理工大学学报(自然科学版) ›› 2013, Vol. 41 ›› Issue (7): 137-144.doi: 10.3969/j.issn.1000-565X.2013.07.023

• 计算机科学与技术 • 上一篇    

用于癌症分类的随机子空间半监督维数约减

文贵华1 蔡先发1,2,3† 韦佳1   

  1. 1.华南理工大学 计算机科学与工程学院,广东 广州 510006; 2.广东药学院 医药信息工程学院,广东 广州 510006;3.深圳市高性能数据挖掘重点实验室,广东 深圳 518055
  • 收稿日期:2013-03-20 出版日期:2013-07-25 发布日期:2013-06-01
  • 通信作者: 蔡先发(1979-),男,在职博士生,广东药学院讲师,主要从事模式识别以及生物信息学研究 E-mail:cxianfa@126.com
  • 作者简介:文贵华(1968-),男,教授,博士生导师,主要从事机器学习、知识发现以及认知几何的研究.E-mail:crghwen@scut.edu.cn
  • 基金资助:

    国家自然科学基金资助项目(61273363, 61070090, 61003174, 60973083)

Random Subspace- Based Semi- Supervised Dimensionality Reduction for Cancer Classification

Wen Gui- hua1 Cai Xian- fa1,2,3† Wei Jia1   

  1. 1.School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006,Guangdong,China;2.School of Medical Information Engineering,Guangdong Pharmaceutical University,Guangzhou 510006,Guangdong,China;3.Shenzhen Key Laboratory of High Performance Data Mining,Shenzhen 518055,Guangdong,China
  • Received:2013-03-20 Online:2013-07-25 Published:2013-06-01
  • Contact: Cai Xian-fa(born in 1979),male,on-job Ph.D.candidate,lecturer in Guangdong Pharmaceutical Uni-versity,mainly researches on pattern recognition and bioinformatics. E-mail:cxianfa@126.com
  • About author:Wen Gui-hua(born in 1968),male,professor,Ph.D.tutor,mainly reasearches on machine learning,knowledgediscovery and cognitive geometry.E-mail:crghwen@scut.edu.cn
  • Supported by:

    Supported by National Natural Science Foundation of China (61273363,61070090,61003174,60973083

摘要: 精确的癌症分类对于癌症的成功诊断和治疗是必不可少的.半监督维数约减算法在干净的数据集上表现得很好,然而当面临噪声时,当前的大部分算法所构造的邻域结构是拓扑不稳定的.为了克服这一问题,文中提出了一种基于随机子空间的半监督维数约减算法( RSSSDR) ,将随机子空间与半监督维数约减算法结合起来.在数据集的不同随机子空间上,该算法首先设计多个不同的子图,然后将这些子图联合起来构成一个混合图并在其上进行维数约减.该算法通过最小化局部重构误差来确定邻域图的边权值,在保持癌症数据集局部结构的同时能够保持其全局结构.在公共癌症数据集上的实验结果表明,RSSSDR 算法具有较高的分类准确率和较好的参数鲁棒性.

关键词: 半监督学习, 随机子空间, 癌症分类, 维数约减

Abstract:

Precise cancer classification is essential to the successful diagnosis and treatment of cancers.Al-though semi- supervised dimensionality reduction approaches perform very well on clean data sets,the topology of the neighborhood constructed with most existing approaches is unstable in the presence of noise.In order to solve this problem,a novel random subspace- based semi- supervised dimensionality reduction algorithm marked as RSSSDR,which combines the random subspace with the semi- supervised dimensionality reduction,is pro-posed.In this algorithm,first,multiple diverse graphs are designed in different random subspaces of data sets and are then combined to form a mixture graph on which dimensionality reduction is performed.Subsequently,the edge weights of neighborhood graph are determined through minimizing the local reconstruction error,such that the global geometric structure of data can be preserved without changing the local geometric structure.Ex-perimental results on public cancer data sets demonstrate that the proposed RSSSDR algorithm is of high classifi-cation accuracy and strong robustness.

Key words: semi- supervised learning, random subspace, cancer classification, dimensionality reduction