Journal of South China University of Technology (Natural Science Edition) ›› 2014, Vol. 42 ›› Issue (7): 21-27.doi: 10.3969/j.issn.1000-565X.2014.07.004

• Computer Science & Technology • Previous Articles     Next Articles

A Novel Online Spam Identification Method Based on User Interest Degree

Wang You- wei Liu Yuan- ning Feng Li- zhou Zhu Xiao- dong   

  1. College of Computer Science and Technology,Jilin University,Changchun 130012,Jilin,China
  • Received:2013-09-17 Revised:2014-05-09 Online:2014-07-25 Published:2014-06-01
  • Contact: 朱晓冬(1964-),男,教授,主要从事虹膜识别、数字水印技术研究. E-mail:zhuxd@jlu.edu.cn
  • About author:王友卫(1987-),男,博士生,主要从事垃圾邮件过滤、数字图像处理研究.E-mail:wyw4966198@126.com
  • Supported by:

    国家科技成果转化项目(财建[ 2011] 329, 财建[ 2012] 258)

Abstract:

Most online spam identification methods cannot effectively distinguish user interest degree in contents ofdifferent emails,thus causing identification precision to be very low.In this paper,a novel online spam identifica-tion method based on the support vector machine (SVM) is proposed.First,according to the theories of incremen-tal learning and active learning,the representative samples are randomly selected from training sets so as to find outsamples with most uncertain classification for users to implement labeling.Then,the concept of the user interestdegree is introduced,and a new sample labeling model and a new algorithm performance evaluation criterion areproposed.Finally,the“roulette”method is employed to add the labeled samples to the training sets.The results ofvarious comparative experiments show that the proposed method effectively helps achieve high spam identificationprecision and high speeds of training samples and selecting the samples to be labeled,so its online application ishighly valuable.

Key words: spam, support vector machines, incremental learning, active learning, user interest

CLC Number: