华南理工大学学报(自然科学版) ›› 2011, Vol. 39 ›› Issue (5): 68-72.doi: 10.3969/j.issn.1000-565X.2011.05.012

• 计算机科学与技术 • 上一篇    下一篇

基于C4.5决策树的嵌入型恶意代码检测方法

张福勇 齐德昱 胡镜林   

  1. 华南理工大学 计算机系统研究所,广东 广州 510006
  • 收稿日期:2010-06-09 修回日期:2010-10-26 出版日期:2011-05-25 发布日期:2011-04-01
  • 通信作者: 张福勇(1982-),男,博士生,主要从事计算机安全研究 E-mail:z.fuyong@mail.scut.edu.cn
  • 作者简介:张福勇(1982-),男,博士生,主要从事计算机安全研究
  • 基金资助:

    国家技术创新基金资助项目(08C26214411198);粤港关键领域重点突破项目(2008A011400010

Detection of Embedded Malware Based on C4.5 Decision Tree

Zhang Fu-yong  Qi De-yu  Hu Jing-lin   

  1. Research Institute of Computer Systems,South China University of Technology,Guangzhou 510006,Guangdong,China
  • Received:2010-06-09 Revised:2010-10-26 Online:2011-05-25 Published:2011-04-01
  • Contact: 张福勇(1982-),男,博士生,主要从事计算机安全研究 E-mail:z.fuyong@mail.scut.edu.cn
  • About author:张福勇(1982-),男,博士生,主要从事计算机安全研究
  • Supported by:

    国家技术创新基金资助项目(08C26214411198);粤港关键领域重点突破项目(2008A011400010

摘要: 嵌入型恶意代码以其高隐蔽性和难检测性,成为计算机安全的新威胁.文中针对以往的统计分析法没有充分考虑嵌入型恶意代码所占字节数小、信息增益大的特点提出一种采用C4.5决策树的嵌入型恶意代码检测方法,即通过提取训练样本中信息增益最大的500个3-gram作为属性特征,建立决策树,实现对未知嵌入型恶意代码的检测.实验结果表明,文中方法在检测率和分类准确率上均具有明显优势,对感染了嵌入型恶意代码的Word文档的检测率达99.80%.

关键词: 嵌入型恶意代码, C4.5决策树, 恶意代码检测, Boosting算法, embedded malware, malware detection, C4.5 decision tree, Boosting algorithm

Abstract:

Embedded malware has become a novel computer security threat due to its high concealment and poor detectability.However,the existing statistical analysis methods are ineffective because they do not fully consider the small number of malicious bytes and the high information gain of embedded malware.In order to solve this problem,a new detection method of embedded malware is proposed based on C4.5 decision tree,which implements the detection by establishing a decision tree with 500 high-information-gain 3-grams extracted from training samples as the attribute.Experimental results show that the proposed method is superior to the existing methods in terms of detection rate and classification accuracy,and that it may achieve a detection rate of 99.80% for infected Word.

Key words: embedded malware, C4.5 decision tree, malware detection, Boosting algorithm, embedded malware, malware detection, C4.5 decision tree, Boosting algorithm