Journal of South China University of Technology(Natural Science Edition) ›› 2024, Vol. 52 ›› Issue (7): 62-71.doi: 10.12141/j.issn.1000-565X.230313

• Electronics, Communication & Automation Technology • Previous Articles     Next Articles

Attention Module Based on Feature Similarity and Feature Normalization

DU Qiliang1,2,3(), WANG Yimin1, TIAN Lianfang1,4,5   

  1. 1.School of Automation Science and Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China
    2.China-Singapore International Joint Research Institute,South China University of Technology,Guangzhou 510555,Guangdong,China
    3.Key Laboratory of Autonomous Systems and Network Control of the Ministry of Education,South China University of Technology,Guangzhou 510640,Guangdong,China
    4.Research Institute of Modern Industrial Innovation,South China University of Technology,Zhuhai 519170,Guangdong,China
    5.Engineering Center of Guangdong Development and Reform Commission,South China University of Technology,Guangzhou 510031,Guangdong,China
  • Received:2023-05-10 Online:2024-07-25 Published:2023-12-22
  • About author:杜启亮(1980—),男,博士,副研究员,主要从事模式识别与机器视觉研究。E-mail: qldu@scut.edu.cn
  • Supported by:
    the Key-Area Research and Development Program of Guangdong Province(2020B1111010002)

Abstract:

In recent years, attention mechanisms have achieved great success in the fields of image classification, object detection and semantic segmentation. However, most existing attention mechanisms can only achieve feature fusion in channel or spatial dimensions, which greatly limits the flexibility of attention mechanisms to change in channel and spatial dimensions and cannot fully utilize feature information. To address this issue, this paper proposes a convolutional neural network attention module based on feature similarity and feature normalization (FSNAM), which can utilize the characteristic information of both channel domain and spatial domain. FSNAM consists of a feature similarity module (FSM) and a feature normalization module (FNM). FSM generates a two-dimension feature similarity weight map using the channel feature information and local spatial feature information of the input feature map, while FNM generates a three-dimension feature normalization weight map using the global spatial feature information of the input feature map. The weight maps generated by FSM and FNM are fused to generate a three-dimension attention weight map to achieve the fusion of channel feature information and spatial feature information. Moreover, to demonstrate the feasibility and effectiveness of FSNAM, ablation experiments are conducted. The results show that, for image classification tasks, FSNAM significantly outperforms other mainstream attention modules in improving the performance of the classification network on CIFAR dataset; for object detection tasks, the object detection network using FSNAM improves the detection accuracy of small and medium-sized objects in VOC dataset by 3.9 and 1.2 points of percentage, respectively; and, for semantic segmentation tasks, FSNAM can significantly improve the performance of HRNet model, and helps to achieve an average pixel accuracy increase of the model on SBD dataset of 0.58 points of percentage.

Key words: convolutional neural network, computer vision, feature similarity, feature normalization, attention module

CLC Number: