收稿日期: 2009-03-12
修回日期: 2009-09-01
网络出版日期: 2010-04-25
基金资助
广东省教育部产学研结合项目(2007B090400031); 广东高校优秀青年创新人才培育项目(LYM08074)
Effects of Several Evaluation Metrics on Imbalanced Data Learning
Received date: 2009-03-12
Revised date: 2009-09-01
Online published: 2010-04-25
Supported by
广东省教育部产学研结合项目(2007B090400031); 广东高校优秀青年创新人才培育项目(LYM08074)
林智勇 郝志峰 杨晓伟 . 若干评价准则对不平衡数据学习影响的研究[J]. 华南理工大学学报(自然科学版), 2010 , 38(4) : 147 -155 . DOI: 10.3969/j.issn.1000-565X.2010.04.027
As most traditional classifiers optimized with the accuracy metric are unsuitable for imbalanced data learning(IDL),this paper performs a meta-learning on a support vector machine(SVM) model,and investigates the IDL affected by such metrics as the accuracy,the balance accuracy,the geometric mean,the F1 score,the information gain,the AUC(Area Under ROC Curve),as well as the two new metrics proposed in this paper,namely GAF and GBF.Moreover,simulation experiments are conducted on 16 imbalanced datasets from UCI,with a statistical analysis of the experimental results being also carried out.It is indicated that(1) there are distinct differences in the effects of these metrics on the classifier's performances;(2) even for the support vector machine(SVM),an advanced learning method,its output classifier is still readily biased to majority class when the classifier is selected by maximizing the accuracy;(3) through the optimization with the help of other metrics,it is feasible to output bias-rectified SVM classifiers,which are of better overall performance,especially in terms of the prediction ability for minor classes;and(4) the output SVM classifiers optimized with GAF and GBF metrics are of stable and good performance.
/
| 〈 |
|
〉 |