华南理工大学学报(自然科学版) ›› 2012, Vol. 40 ›› Issue (8): 106-109.

• 生物工程 • 上一篇    下一篇

基于D2shepp 统计法的非序列局部比对

刘雪梅 文德华 於黄忠 高亚妮   

  1. 华南理工大学 物理系,广东 广州 510640
  • 收稿日期:2011-03-18 修回日期:2011-11-28 出版日期:2012-08-25 发布日期:2012-07-01
  • 通信作者: 刘雪梅(1975-) ,女,博士,讲师,主要从事生物物理研究. E-mail:liuxm@ scut.edu.cn
  • 作者简介:刘雪梅(1975-) ,女,博士,讲师,主要从事生物物理研究.
  • 基金资助:

    国家自然科学基金资助项目( 10947023, 61176061) ; 华南理工大学中央高校基本科研业务费专项资金资助项目( 20112M0088)

Local Alignment-Free Sequences Based on D2shepp Statistics

Liu Xue-mei  Wen De-hua  Yu Huang-zhong  Gao Ya-ni   

  1. Department of Physics,South China University of Technology,Guangzhou 510640,Guangdong,China
  • Received:2011-03-18 Revised:2011-11-28 Online:2012-08-25 Published:2012-07-01
  • Contact: 刘雪梅(1975-) ,女,博士,讲师,主要从事生物物理研究. E-mail:liuxm@ scut.edu.cn
  • About author:刘雪梅(1975-) ,女,博士,讲师,主要从事生物物理研究.
  • Supported by:

    国家自然科学基金资助项目( 10947023, 61176061) ; 华南理工大学中央高校基本科研业务费专项资金资助项目( 20112M0088)

摘要: 两条生物序列间的相似性比对是计算生物学探讨的主要问题之一,一种快速的依赖于k-元组的D2shepp 统计法目前已被应用到非序列比对中.文中在零模型的基础上产生两条相互独立的随机序列,基于D2shepp 统计法进行了两条序列的局部比对,找到局部比对的最优值并求和. 在此基础上模拟了Power 值的分布情况,并分析了不同k 参数下的Power 值分布. 在相同参数下将文中提出的局部比对与已有的D2shepp 统计的全局比
对进行比较,发现局部比对D2shepp 统计的Power 值随着序列长度的增大而快速地接近于1,比全局比对更加快速、准确.

关键词: 非序列比对, D2shepp 统计法, 局部比对, Power 值

Abstract:

The similarities between two biological sequences is a major issue in computational biology,and a fast D2shepp statistics method based on the joint k-tuple content in two sequences has been used in the alignment-free sequence comparison. In this paper,two separate random sequences are generated based on the zero model,and their local alignment is conducted based on D2shepp statistics,thus obtaining the optimal values and the sum of these values. Then,the Power distribution is simulated and the distributions with different k values are analyzed. Finally,with the same parameters,the proposed local alignment is compared with the global alignment based on D2shepp statistics. It is found that the Power value of the proposed local alignment rapidly approaches 1 with the increase of the sequence length and that the proposed local alignment is quicker and more accurate than the global one.

Key words: alignment-free sequence, D2shepp statistics, local alignment, Power value