计算机科学与技术

不平衡数据的迁移学习分类算法

展开
  •  华南理工大学 计算机科学与工程学院
陈琼( 1966-) ,女,副教授,主要从事人工智能、机器学习、智能计算等研究

收稿日期: 2016-12-27

  修回日期: 2017-03-24

  网络出版日期: 2017-12-01

基金资助

国家自然科学基金资助项目( 61573145) ;
广东省自然科学基金资助项目( 2015A030308018) 

Transfer Learning for Classification on Imbalanced Data

Expand
  • School of Computer Science and Engineering,South China University of Technology
陈琼( 1966-) ,女,副教授,主要从事人工智能、机器学习、智能计算等研究

Received date: 2016-12-27

  Revised date: 2017-03-24

  Online published: 2017-12-01

Supported by

The National Natural Science Foundation of China( 61573145) and the Natural Science Foundation of Guangdong Province of China( 2015A030308018) 

摘要

现实中数据分布不平衡的情况越来越多,给以数据分布基本均衡为前提的传统分类算法带来了一定的挑战。利用相关的辅助数据进行迁移学习可以解决目标数据的分布不平衡问题。本文以TrAdaboost算法为基础,提出了一个针对不平衡数据的二分类迁移学习算法UnbalancedTrAdaboost(UBTA)。UBTA算法利用不同类别的Precision-Recall曲线下的面积auprc(the Area Under the Precision-Recall Curve)计算弱分类器权重,对不同类别的样本采取不同的权重更新策略。由于AUC指标对数据分布变化不敏感,结合G-mean和BER能更准确地评估不平衡分类算法的性能。综合三种指标的实验结果表明,UBTA具有较好的分类性能,既能提升对少数类的关注,又能保持多数类的分类准确度。

本文引用格式

陈琼 徐洋洋 陈林清 . 不平衡数据的迁移学习分类算法[J]. 华南理工大学学报(自然科学版), 2018 , 46(1) : 122 -130 . DOI: 10.3969/j.issn.1000-565X.2018.01.016

Abstract

Traditional classification algorithms based on the balance data meet some challenges, when data distribution become more and more imbalanced. Transfer learning can solve the problem of imbalanced data distribution by using the relevant auxiliary data sets to compensate the imbalanced target data set. In this paper, we proposed the UnbalancedTrAdaboost(UBTA) binary classification algorithm based on TrAdaboost, which calculates the weights of weak classifiers usingthe auprc (the Area Under the Precision-Recall Curve) of different classes and updates the weights of misclassified data of different classes with different mechanisms. The AUC measure is more accurate combined with G-mean and BER when evaluated the unbalanced classification, since AUC is insensitive to changes in class distribution. The results of these three metrics indicate that, the UBTA algorithm achieves better performance for imbalanced data and classifies more minority instances with the high accuracy of majority instances.

参考文献

 
文章导航

/