收稿日期: 2013-09-17
修回日期: 2014-05-09
网络出版日期: 2014-06-01
基金资助
国家科技成果转化项目(财建[ 2011] 329, 财建[ 2012] 258)
A Novel Online Spam Identification Method Based on User Interest Degree
Received date: 2013-09-17
Revised date: 2014-05-09
Online published: 2014-06-01
Supported by
国家科技成果转化项目(财建[ 2011] 329, 财建[ 2012] 258)
王友卫 刘元宁 凤丽洲 朱晓冬 . 基于用户兴趣度的垃圾邮件在线识别新方法[J]. 华南理工大学学报(自然科学版), 2014 , 42(7) : 21 -27 . DOI: 10.3969/j.issn.1000-565X.2014.07.004
Most online spam identification methods cannot effectively distinguish user interest degree in contents ofdifferent emails,thus causing identification precision to be very low.In this paper,a novel online spam identifica-tion method based on the support vector machine (SVM) is proposed.First,according to the theories of incremen-tal learning and active learning,the representative samples are randomly selected from training sets so as to find outsamples with most uncertain classification for users to implement labeling.Then,the concept of the user interestdegree is introduced,and a new sample labeling model and a new algorithm performance evaluation criterion areproposed.Finally,the“roulette”method is employed to add the labeled samples to the training sets.The results ofvarious comparative experiments show that the proposed method effectively helps achieve high spam identificationprecision and high speeds of training samples and selecting the samples to be labeled,so its online application ishighly valuable.
Key words: spam; support vector machines; incremental learning; active learning; user interest
/
| 〈 |
|
〉 |