Journal of South China University of Technology(Natural Science) >
Effects of Several Evaluation Metrics on Imbalanced Data Learning
Received date: 2009-03-12
Revised date: 2009-09-01
Online published: 2010-04-25
Supported by
广东省教育部产学研结合项目(2007B090400031); 广东高校优秀青年创新人才培育项目(LYM08074)
As most traditional classifiers optimized with the accuracy metric are unsuitable for imbalanced data learning(IDL),this paper performs a meta-learning on a support vector machine(SVM) model,and investigates the IDL affected by such metrics as the accuracy,the balance accuracy,the geometric mean,the F1 score,the information gain,the AUC(Area Under ROC Curve),as well as the two new metrics proposed in this paper,namely GAF and GBF.Moreover,simulation experiments are conducted on 16 imbalanced datasets from UCI,with a statistical analysis of the experimental results being also carried out.It is indicated that(1) there are distinct differences in the effects of these metrics on the classifier's performances;(2) even for the support vector machine(SVM),an advanced learning method,its output classifier is still readily biased to majority class when the classifier is selected by maximizing the accuracy;(3) through the optimization with the help of other metrics,it is feasible to output bias-rectified SVM classifiers,which are of better overall performance,especially in terms of the prediction ability for minor classes;and(4) the output SVM classifiers optimized with GAF and GBF metrics are of stable and good performance.
Lin Zhi-yong Hao Zhi-feng Yang Xiao-wei . Effects of Several Evaluation Metrics on Imbalanced Data Learning[J]. Journal of South China University of Technology(Natural Science), 2010 , 38(4) : 147 -155 . DOI: 10.3969/j.issn.1000-565X.2010.04.027
/
| 〈 |
|
〉 |