华南理工大学学报(自然科学版) ›› 2017, Vol. 45 ›› Issue (11): 106-111.doi: 10.3969/j.issn.1000-565X.2017.11.015

• 生物学 • 上一篇    下一篇

病毒感染宿主细胞可能性的序列非比对法评估

刘雪梅1 臧翔1 黄天来1 杨哲1,2 李文1 叶宇中1 胡珊3†   

  1. 1. 华南理工大学 物理与光电学院,广东 广州 510640; 2. 中国工商银行 广州东城支行,广东 广州 510100; 3. 中山大学 中山医学院计算机中心,广东 广州 510275
  • 收稿日期:2017-01-13 修回日期:2017-07-09 出版日期:2017-11-25 发布日期:2017-10-01
  • 通信作者: 胡珊(1972-),女,博士,讲师,主要从事生物信息学研究. E-mail:hushan@mail.sysu.edu.cn
  • 作者简介:刘雪梅(1975-),女,博士,副教授,主要从事生物信息学研究. E-mail:liuxm@scut.edu.cn
  • 基金资助:
    国家自然科学基金青年基金资助项目(11205061,11205062)

Evaluation of Infection Possibility of Host Cell by Virus on the Basis of Sequence Alignment-Free Comparison

LIU Xue-mei1 ZANG Xiang1 HUANG Tian-lai1 YANG Zhe1,2 LI Wen1 YE Yu-zhong1 HU Shan3   

  1. 1.School of Physics and Optoelectronics,South China University of Technology,Guangzhou 510640,Guangdong,China; 2.ICBC Guangzhou Dongcheng Branch,Guangzhou 510100,Guangdong,China; 3.Department of Biomedical Engineering,Zhongshan School of Medicine,Sun Yat-Sen University,Guangzhou 510275,Guangdong,China
  • Received:2017-01-13 Revised:2017-07-09 Online:2017-11-25 Published:2017-10-01
  • Contact: 胡珊(1972-),女,博士,讲师,主要从事生物信息学研究. E-mail:hushan@mail.sysu.edu.cn
  • About author:刘雪梅(1975-),女,博士,副教授,主要从事生物信息学研究. E-mail:liuxm@scut.edu.cn
  • Supported by:
    Supported by the National Natural Science Foundation of China for Young Scientists(11205061,11205062)

摘要: 病毒与宿主细胞在遗传信息上具有相似的字模式(k-tuple),病毒的 DNA 序列与 其可感染的宿主细胞的 DNA 序列通过字模式的统计打分值往往比与随机宿主细胞的打 分值高,也就是病毒和其可感染的宿主细胞的 DNA 序列有一定的相似性. 基于此原理,文 中利用序列非比对统计方法 DS2 和 D*2 对病毒的 DNA 序列和宿主细胞的 DNA 序列基于 字模式进行比对打分,将打分值与获得的阈值进行比较,判断该病毒是否能感染宿主细 胞. 实验结果表明,当 k =5(k 为字模式的的大小)、马尔可夫阶次为 1 时,DS2 和 D*2 统计 量均能较好地反映病毒与宿主细胞在基因上的相似性,而且通过 ROC(受试者工作特征 曲线)获得的最佳阈值可以作为一种判断病毒是否可感染宿主细胞的方法.

关键词: 生物信息学, 病毒, 宿主细胞, 序列非比对法

Abstract: A virus and its host cell have a similar word pattern (k-tuple).The scores of the DNA sequences of the virus and its host cell,which are obtained by means of the word pattern,are often higher than those of random host cells,that is to say,the DNA sequence of the virus is similar to that of its host.On the basis of this principle,two alignment-free statistics DS2 and D*2 are adopted to acquire the scores between the DNA sequence of the virus and that of its host cell in this paper.Then,the scores are compared with the threshold,so as to judge whether the vi- rus can infect the host cell.Experimental results show that,when k =5 (k is the size of k-tuple) and Markov order is 1,both of the statistics and can describe the similarity between the virus and its host cell in genes,and that the optimal threshold of DS2 and D*2 from the ROC (Receiver Operator Characteristic) curves can be used to judge whether the virus can infect the host cell.

Key words: bioinformatics, virus, host cell, sequence alignment-free comparison

中图分类号: