Computer Science & Technology

Effects of Several Evaluation Metrics on Imbalanced Data Learning

Expand
  • 1.School of Computer Science and Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China;2.School of Applied Mathematics,Guangdong University of Technology,Guangzhou 510006,Guangdong,China;3.College of Science,South China University of Technology,Guangzhou 510640,Guangdong,China
林智勇(1977-),男,博士生,广东技术师范学院副教授,主要从事机器学习与智能计算研究

Received date: 2009-03-12

  Revised date: 2009-09-01

  Online published: 2010-04-25

Supported by

广东省教育部产学研结合项目(2007B090400031); 广东高校优秀青年创新人才培育项目(LYM08074)

Abstract

As most traditional classifiers optimized with the accuracy metric are unsuitable for imbalanced data learning(IDL),this paper performs a meta-learning on a support vector machine(SVM) model,and investigates the IDL affected by such metrics as the accuracy,the balance accuracy,the geometric mean,the F1 score,the information gain,the AUC(Area Under ROC Curve),as well as the two new metrics proposed in this paper,namely GAF and GBF.Moreover,simulation experiments are conducted on 16 imbalanced datasets from UCI,with a statistical analysis of the experimental results being also carried out.It is indicated that(1) there are distinct differences in the effects of these metrics on the classifier's performances;(2) even for the support vector machine(SVM),an advanced learning method,its output classifier is still readily biased to majority class when the classifier is selected by maximizing the accuracy;(3) through the optimization with the help of other metrics,it is feasible to output bias-rectified SVM classifiers,which are of better overall performance,especially in terms of the prediction ability for minor classes;and(4) the output SVM classifiers optimized with GAF and GBF metrics are of stable and good performance.

Cite this article

Lin Zhi-yong Hao Zhi-feng Yang Xiao-wei . Effects of Several Evaluation Metrics on Imbalanced Data Learning[J]. Journal of South China University of Technology(Natural Science), 2010 , 38(4) : 147 -155 . DOI: 10.3969/j.issn.1000-565X.2010.04.027

Outlines

/