华南理工大学学报(自然科学版) ›› 2022, Vol. 50 ›› Issue (4): 26-34,45.doi: 10.12141/j.issn.1000-565X.210267

所属专题: 2022年计算机科学与技术

• 计算机科学与技术 • 上一篇    下一篇

基于自适应采样的不平衡分类方法

陈琼谢家亮2   

  1. 华南理工大学 计算机科学与工程学院,广东 广州 510006
  • 收稿日期:2021-04-28 修回日期:2021-11-07 出版日期:2022-04-25 发布日期:2021-11-26
  • 通信作者: 陈琼 (1966-),女,博士,副教授,主要从事人工智能、机器学习、智能计算等研究 E-mail:csqchen@ scut. edu. cn
  • 作者简介:陈琼 (1966-),女,博士,副教授,主要从事人工智能、机器学习、智能计算等研究
  • 基金资助:
    广东省级科技计划项目

An Imbalanced Classification Method based on Adaptive Sampling

CHEN Qiong1 XIE Jialiang2#br#   

  1. 1. School of Management Science and Engineering,Anhui University of Finance and Economics,Bengbu 233030,Anhui,
    China; 2. School of Electronics and Information Engineering,Anhui University,Hefei 230601,Anhui,China

  • Received:2021-04-28 Revised:2021-11-07 Online:2022-04-25 Published:2021-11-26
  • Contact: 陈琼 (1966-),女,博士,副教授,主要从事人工智能、机器学习、智能计算等研究 E-mail:csqchen@ scut. edu. cn
  • About author:陈琼 (1966-),女,博士,副教授,主要从事人工智能、机器学习、智能计算等研究
  • Supported by:
    Key-Area Research and Development Program of Guangdong Province

摘要: 针对传统重采样方法大多使用固定采样策略,无法根据模型的优化需求改变采样策略的问题,本文提出一种基于自适应采样的不平衡分类方法(Adaptive Sampling Imbalanced Classification,ASIC)。该方法根据分类模型在验证集上的表现动态调整训练集上不同类别样本的采样概率,使不同类别的采样概率由当前分类模型的需求动态决定。同时,该方法对少数类别给予额外的关注,在其余条件相同的情况下为少数类赋予更大的采样概率,以弥补少数类本身样本数量不足对分类模型造成的不良影响,从而提高分类模型对少数类的识别能力。实验结果表明,使用ASIC方法训练的分类模型在平均类准确率以及geometric mean上均比对比方法更好,且数据分布越不平衡,ASIC方法的优越性越明显。

关键词: 不平衡分类, 自适应采样, 分类召回率

Abstract: In view of the problem that traditional resampling methods mostly use fixed sampling strategies and cannot change the sampling strategy according to the optimization requirements of the model, this paper proposes an adaptive sampling-based imbalanced classification method (Adaptive Sampling Imbalanced Classification, ASIC). This method dynamically adjusts the sampling probabilities of samples of different classes on the training set according to the performance of the classification model on the validation set, so that the sampling probabilities of different classes are dynamically determined by the requirements of the current classification model. At the same time, this method pays extra attention to the minority classes, and gives the minority classes a higher sampling probability under the same other conditions, so as to compensate for the negative impact of the insufficient example number of the minority class itself on the classification model, thereby improving the classification model's ability to recognize minority classes. The experimental results show that the classification model trained with the ASIC method is better than the comparison methods in terms of balanced accuracy and geometric mean, and the more imbalanced the data distribution, the more obvious the superiority of the ASIC method.

Key words: imbalanced classification, adaptive sampling, recall

中图分类号: