Journal of South China University of Technology(Natural Science Edition) ›› 2023, Vol. 51 ›› Issue (5): 13-23.doi: 10.12141/j.issn.1000-565X.220684

Special Issue: 2023年计算机科学与技术

• Computer Science & Technology • Previous Articles     Next Articles

Contrastive Knowledge Distillation Method Based on Feature Space Embedding

YE Feng CHEN Biao LAI Yizong   

  1. School of Mechanical and Automotive Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China
  • Received:2022-10-24 Online:2023-05-25 Published:2023-01-16
  • Contact: 叶峰(1972-),男,博士,副教授,主要从事机器视觉及移动机器人传感控制研究。 E-mail:mefengye@scut.edu.cn
  • About author:叶峰(1972-),男,博士,副教授,主要从事机器视觉及移动机器人传感控制研究。
  • Supported by:
    the Key-Area R&D Program of Guangdong Province(2021B0101420003)

Abstract:

Because of its important role in model compression, knowledge distillation has attracted much attention in the field of deep learning. However, the classical knowledge distillation algorithm only uses the information of a single sample, and neglects the importance of the relationship between samples, leading to its poor performance. To improve the efficiency and performance of knowledge transfer in knowledge distillation algorithm, this paper proposed a feature-space-embedding based contrastive knowledge distillation (FSECD) algorithm. The algorithm adopts efficient batch construction strategy, which embeds the student feature into the teacher feature space so that each student feature builds N contrastive pairs with N teacher features. In each pair, the teacher feature is optimized and fixed, while student feature is to be optimized and tunable. In the training process, the distance for positive pairs is narrowed and the distance for negative pairs is expanded, so that student model can perceive and learn the inter-sample relations of teacher model and realize the transfer of knowledge from teacher model to student model. Extensive experiments with different teacher/student architecture settings on CIFAR-100 and ImageNet datasets show that, FSECD algorithm achieves significant performance improvement without additional network structures and data when compared with other cutting-edge distillation methods, which further proves the importance of the inter-sample relations in knowledge distillation.

Key words: image classification, knowledge distillation, convolutional neural network, deep learning, contrastive learning

CLC Number: