华南理工大学学报(自然科学版) ›› 2013, Vol. 41 ›› Issue (12): 101-106.doi: 10.3969/j.issn.1000-565X.2013.12.017

• 生物工程 • 上一篇    下一篇

用于基因芯片数据分析的模块性图聚类方法

李力 曹以诚 毛晓帆   

  1. 华南理工大学 生物科学与工程学院,广东 广州 510006
  • 收稿日期:2013-04-22 修回日期:2013-09-27 出版日期:2013-12-25 发布日期:2013-11-19
  • 通信作者: 曹以诚(1949-),男,教授,博士生导师,主要从事微分子生物学和生物信息学研究. E-mail:yccao@scut.edu.cn
  • 作者简介:李力(1981-),男,博士生,主要从事基因芯片与生物信息学研究.E-mail:ottolear@gmail.com
  • 基金资助:

    教育部中国网格计划生物信息网格平台子项目(B12137040130)

Modularity- Based Graph Clustering for Analysis of Gene Microarray Data

Li Li Cao Yi- cheng Mao Xiao- fan   

  1. School of Biological Science and Engineering,South China University of Technology,Guangzhou 510006,Guangdong,China
  • Received:2013-04-22 Revised:2013-09-27 Online:2013-12-25 Published:2013-11-19
  • Contact: 曹以诚(1949-),男,教授,博士生导师,主要从事微分子生物学和生物信息学研究. E-mail:yccao@scut.edu.cn
  • About author:李力(1981-),男,博士生,主要从事基因芯片与生物信息学研究.E-mail:ottolear@gmail.com
  • Supported by:

    教育部中国网格计划生物信息网格平台子项目(B12137040130)

摘要: 图聚类是一种重要的聚类算法,可有效应用于蛋白质作用网络和芯片数据聚类等领域.文中针对现有基因芯片数据图聚类方法的不足,提出了一种基于模块性指标和子图平滑度的全局图聚类方法.为防止算法陷入局部最优解,引入子图平滑度的定义,打散每次聚类结果中产生的平滑度较低的子图,再对得到的单节点进行下一次聚类,经多次迭代后得到全局最优的聚类结果.采用一组基因组表达数据,将该方法和其他4 种常用聚类方法(经典图聚类、k 均值、SOM 及Fuzzy 算法)进行比较,结果表明:该方法在聚类过程中的平均类间重叠度和FOM'值总体上优于其他4 种算法,在将数据集分类到最佳聚类数39 时,其FOM'值分别比上述4 种方法低28.41% 、19.21% 、9.84% 和24.67% ;其分类准确度高于Fuzzy 法和SOM 算法,算法执行效率则与SOM 算法相近,比Fuzzy 法高5.94% .

关键词: 基因芯片, 图聚类, 模块性, 平滑度, 算法

Abstract:

As an important clustering algorithm,graph clustering can be effectively applied to protein interactionnetworks and microarray data clustering.In this paper,to overcome the shortcomings of the existing graph cluste-ring methods for gene microarray data,a global graph clustering method based on the modularity and the subgraphsmoothness is proposed.In this algorithm,subgraph smoothness is introduced to avoid the local optimal solution,subgraphs with low smoothness values in the clustering results are split into singletons,and those newly- generatedsingletons are used in the next clustering step.After several iterations,the global optimal clustering result can beobtained.The proposed method is then compared with four commonly- used clustering methods (the classic graphclustering,the k- means algorithm,the SOM algorithm,and the Fuzzy algorithm) on a group of genome expressiondata,and the results show that (1) the proposed method is superior to the other four methods in terms of averagenon- overlap proportion and FOM' value; (2) when the dataset is divided into 39 clusters,the FOM' value of theproposed method is respectively 28.41%,19.21%,9.84% and 24.67% lower than those of the other four me-thods; and (3) the proposed method is of a classification accuracy,which is higher than that of the Fuzzy algorithmand the SOM algorithm,with an execution efficiency similar to that of the SOM algorithm but 5.94% higher thanthat of the Fuzzy method.

Key words: gene microarray, graph clustering, modularity, smoothness, algorithm

中图分类号: