Journal of South China University of Technology(Natural Science) >
“Word Frequency-Filtering”Hybrid Feature Selection Method Applied to Spam Identification
Received date: 2016-05-03
Revised date: 2016-10-26
Online published: 2017-02-02
Supported by
Supported by the Natural Science Foundation of Guangdong Province of China ( 2016A030310412)
In order to solve the increasingly rampant spam problem,naive Bayes and support vector machine classification methods are used to identify spam emails in this paper.In this method,"word frequency-filtering”hybrid feature selection method is applied to classification models to improve the identification performance of classifiers,and the identification performance of naive Bayes classification method is enhanced by considering more comprehensive classification probability cases.Moreover,some experiments are designed to test and verify the identification performance of the spam detection system in terms of accuracy rate,recall rate and F1 score.The results show that the proposed“word frequency-filtering”hybrid feature selection method can improve the identification performance of spam classifiers effectively,and that the classification output adjustment module based on the cost-sensitive method can greatly reduce the probability that the classifier mistakes a non-spam email as a spam email.In conclusion,the spam identification system designed and implemented in this paper possesses strong practicability and applicability in practical work and life.
CHEN Jun-ying ZHOU Shun-feng MIN Hua-qing . “Word Frequency-Filtering”Hybrid Feature Selection Method Applied to Spam Identification[J]. Journal of South China University of Technology(Natural Science), 2017 , 45(3) : 82 -88 . DOI: 10.3969/j.issn.1000-565X.2017.03.012
/
| 〈 |
|
〉 |