Journal of South China University of Technology(Natural Science Edition) ›› 2022, Vol. 50 ›› Issue (4): 26-34,45.doi: 10.12141/j.issn.1000-565X.210267

Special Issue: 2022年计算机科学与技术

• Computer Science & Technology • Previous Articles     Next Articles

An Imbalanced Classification Method based on Adaptive Sampling

CHEN Qiong1 XIE Jialiang2#br#   

  1. 1. School of Management Science and Engineering,Anhui University of Finance and Economics,Bengbu 233030,Anhui,
    China; 2. School of Electronics and Information Engineering,Anhui University,Hefei 230601,Anhui,China

  • Received:2021-04-28 Revised:2021-11-07 Online:2022-04-25 Published:2021-11-26
  • Contact: 陈琼 (1966-),女,博士,副教授,主要从事人工智能、机器学习、智能计算等研究 E-mail:csqchen@ scut. edu. cn
  • About author:陈琼 (1966-),女,博士,副教授,主要从事人工智能、机器学习、智能计算等研究
  • Supported by:
    Key-Area Research and Development Program of Guangdong Province

Abstract: In view of the problem that traditional resampling methods mostly use fixed sampling strategies and cannot change the sampling strategy according to the optimization requirements of the model, this paper proposes an adaptive sampling-based imbalanced classification method (Adaptive Sampling Imbalanced Classification, ASIC). This method dynamically adjusts the sampling probabilities of samples of different classes on the training set according to the performance of the classification model on the validation set, so that the sampling probabilities of different classes are dynamically determined by the requirements of the current classification model. At the same time, this method pays extra attention to the minority classes, and gives the minority classes a higher sampling probability under the same other conditions, so as to compensate for the negative impact of the insufficient example number of the minority class itself on the classification model, thereby improving the classification model's ability to recognize minority classes. The experimental results show that the classification model trained with the ASIC method is better than the comparison methods in terms of balanced accuracy and geometric mean, and the more imbalanced the data distribution, the more obvious the superiority of the ASIC method.

Key words: imbalanced classification, adaptive sampling, recall

CLC Number: