华南理工大学学报(自然科学版) ›› 2024, Vol. 52 ›› Issue (7): 62-71.doi: 10.12141/j.issn.1000-565X.230313

• 电子、通信与自动控制 • 上一篇    下一篇

基于特征相似性和特征规范化的注意力模块

杜启亮1,2,3(), 汪益民1, 田联房1,4,5   

  1. 1.华南理工大学 自动化科学与工程学院,广东 广州 510640
    2.华南理工大学 中新国际联合研究院,广东 广州 510555
    3.华南理工大学 自主系统与网络控制教育部重点实验室,广东 广州 510640
    4.华南理工大学 珠海现代产业创新研究院,广东 珠海 519170
    5.华南理工大学 广东省发展改革委工程中心,广东 广州 510031
  • 收稿日期:2023-05-10 出版日期:2024-07-25 发布日期:2023-12-22
  • 作者简介:杜启亮(1980—),男,博士,副研究员,主要从事模式识别与机器视觉研究。E-mail: qldu@scut.edu.cn
  • 基金资助:
    广东省重点领域研发计划项目(2020B1111010002)

Attention Module Based on Feature Similarity and Feature Normalization

DU Qiliang1,2,3(), WANG Yimin1, TIAN Lianfang1,4,5   

  1. 1.School of Automation Science and Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China
    2.China-Singapore International Joint Research Institute,South China University of Technology,Guangzhou 510555,Guangdong,China
    3.Key Laboratory of Autonomous Systems and Network Control of the Ministry of Education,South China University of Technology,Guangzhou 510640,Guangdong,China
    4.Research Institute of Modern Industrial Innovation,South China University of Technology,Zhuhai 519170,Guangdong,China
    5.Engineering Center of Guangdong Development and Reform Commission,South China University of Technology,Guangzhou 510031,Guangdong,China
  • Received:2023-05-10 Online:2024-07-25 Published:2023-12-22
  • About author:杜启亮(1980—),男,博士,副研究员,主要从事模式识别与机器视觉研究。E-mail: qldu@scut.edu.cn
  • Supported by:
    the Key-Area Research and Development Program of Guangdong Province(2020B1111010002)

摘要:

近年来,注意力机制在图像分类、目标检测和语义分割等领域取得了巨大成功,但现有的注意力机制大多只能在通道或空间维度上实现特征融合,这极大限制了其在通道和空间维度上变化的灵活性,导致无法充分利用特征信息。为此,文中提出一种基于特征相似性和特征规范化的、可同时利用特征图各维度信息的卷积神经网络注意力模块FSNAM。该模块由特征相似性模块(FSM)和特征规范化模块(FNM)两部分组成,FSM利用输入特征图的通道特征信息和局部空间特征信息生成一个二维的特征相似性权重图;FNM利用输入特征图的全局空间特征信息生成一个三维的特征规范化权重图;两个模块生成的权重图融合在一起,生成一个三维的注意力权重图,以此实现通道特征信息和空间特征信息的融合。为证明FSNAM的可行性和有效性,进行了消融实验,结果表明:在图像分类任务方面,FSNAM模块对分类网络在CIFAR数据集上的性能提升明显优于其他主流注意力模块;在目标检测任务方面,使用FSNAM模块的目标检测网络对VOC数据集中的小目标和中等大小目标的检测准确率分别提高了3.9和1.2个百分点;在语义分割任务方面,使用FSNAM模块可以提高HRNet模型的性能,在SBD数据集上模型的平均像素准确率提高了0.58个百分点。

关键词: 卷积神经网络, 计算机视觉, 特征相似性, 特征规范化, 注意力模块

Abstract:

In recent years, attention mechanisms have achieved great success in the fields of image classification, object detection and semantic segmentation. However, most existing attention mechanisms can only achieve feature fusion in channel or spatial dimensions, which greatly limits the flexibility of attention mechanisms to change in channel and spatial dimensions and cannot fully utilize feature information. To address this issue, this paper proposes a convolutional neural network attention module based on feature similarity and feature normalization (FSNAM), which can utilize the characteristic information of both channel domain and spatial domain. FSNAM consists of a feature similarity module (FSM) and a feature normalization module (FNM). FSM generates a two-dimension feature similarity weight map using the channel feature information and local spatial feature information of the input feature map, while FNM generates a three-dimension feature normalization weight map using the global spatial feature information of the input feature map. The weight maps generated by FSM and FNM are fused to generate a three-dimension attention weight map to achieve the fusion of channel feature information and spatial feature information. Moreover, to demonstrate the feasibility and effectiveness of FSNAM, ablation experiments are conducted. The results show that, for image classification tasks, FSNAM significantly outperforms other mainstream attention modules in improving the performance of the classification network on CIFAR dataset; for object detection tasks, the object detection network using FSNAM improves the detection accuracy of small and medium-sized objects in VOC dataset by 3.9 and 1.2 points of percentage, respectively; and, for semantic segmentation tasks, FSNAM can significantly improve the performance of HRNet model, and helps to achieve an average pixel accuracy increase of the model on SBD dataset of 0.58 points of percentage.

Key words: convolutional neural network, computer vision, feature similarity, feature normalization, attention module

中图分类号: