收稿日期: 2016-05-03
修回日期: 2016-10-26
网络出版日期: 2017-02-02
基金资助
广东省自然科学基金资助项目( 2016A030310412) ; 广东高校省级重点平台及科研项目- 青年创新人才类项目( 2015KQNCX003) ; 广州市科技计划重点实验室项目( 15180007) ; 广州市科技计划项目( 201707010223)
“Word Frequency-Filtering”Hybrid Feature Selection Method Applied to Spam Identification
Received date: 2016-05-03
Revised date: 2016-10-26
Online published: 2017-02-02
Supported by
Supported by the Natural Science Foundation of Guangdong Province of China ( 2016A030310412)
陈俊颖 周顺风 闵华清 . 用于垃圾邮件识别的“词频-筛”混合特征选择方法[J]. 华南理工大学学报(自然科学版), 2017 , 45(3) : 82 -88 . DOI: 10.3969/j.issn.1000-565X.2017.03.012
In order to solve the increasingly rampant spam problem,naive Bayes and support vector machine classification methods are used to identify spam emails in this paper.In this method,"word frequency-filtering”hybrid feature selection method is applied to classification models to improve the identification performance of classifiers,and the identification performance of naive Bayes classification method is enhanced by considering more comprehensive classification probability cases.Moreover,some experiments are designed to test and verify the identification performance of the spam detection system in terms of accuracy rate,recall rate and F1 score.The results show that the proposed“word frequency-filtering”hybrid feature selection method can improve the identification performance of spam classifiers effectively,and that the classification output adjustment module based on the cost-sensitive method can greatly reduce the probability that the classifier mistakes a non-spam email as a spam email.In conclusion,the spam identification system designed and implemented in this paper possesses strong practicability and applicability in practical work and life.
/
| 〈 |
|
〉 |