Journal of South China University of Technology (Natural Science Edition) ›› 2014, Vol. 42 ›› Issue (3): 8-14.doi: 10.3969/j.issn.1000-565X.2014.03.002

• Electronics, Communication & Automation Technology • Previous Articles     Next Articles

Compression Algorithm of DNA Sequences Based on Mixed Statistical Model

Sun Ji- feng Tong Xue- ke Tan Li   

  1. School of Electronic and Information Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China
  • Received:2013-08-09 Revised:2013-12-03 Online:2014-03-25 Published:2014-02-19
  • Contact: 孙季丰(1962-),男,教授,博士生导师,主要从事图像与视频处理、自组织通信网研究. E-mail:ecjfsun@scut.edu.cn
  • About author:孙季丰(1962-),男,教授,博士生导师,主要从事图像与视频处理、自组织通信网研究.
  • Supported by:

    国家自然科学基金青年科学基金资助项目(61202292)

Abstract:

Proposed in this paper is a compression algorithm of DNA sequences based on the mixed statistical mo-del,which estimates the probability of each symbol of a DNA sequence in line with the principle of expert model al-gorithm (XM algorithm) and the mixed finite context statistical model.Then,the estimated probability is applied tothe arithmetic coding to encode each symbol of standard DNA sequences.Experimental results show that (1) ascompared with the single finite context model,the mixed statistical model helps to obtain better compression effect;(2) the proposed algorithm based on mixed statistical model helps to achieve higher compression ratio than those ofsome other classical compression algorithms; (3) it effectively overcomes the deficiencies of XM algorithm for thestandard dataset compression of DNA sequences,although the XM algorithm based on statistical information is ratheradvanced; and (4) the proposed algorithm needs to be improved for the compression of high- throughput DNA se-quences.

Key words: DNA sequence compression, XM algorithm, finite context model, mixed statistical model

CLC Number: