电子、通信与自动控制

基于特征相似性和特征规范化的注意力模块

  • 杜启亮 ,
  • 汪益民 ,
  • 田联房
展开
  • 1.华南理工大学 自动化科学与工程学院,广东 广州 510640
    2.华南理工大学 中新国际联合研究院,广东 广州 510555
    3.华南理工大学 自主系统与网络控制教育部重点实验室,广东 广州 510640
    4.华南理工大学 珠海现代产业创新研究院,广东 珠海 519170
    5.华南理工大学 广东省发展改革委工程中心,广东 广州 510031
杜启亮(1980—),男,博士,副研究员,主要从事模式识别与机器视觉研究。E-mail: qldu@scut.edu.cn

收稿日期: 2023-05-10

  网络出版日期: 2023-12-22

基金资助

广东省重点领域研发计划项目(2020B1111010002)

Attention Module Based on Feature Similarity and Feature Normalization

  • DU Qiliang ,
  • WANG Yimin ,
  • TIAN Lianfang
Expand
  • 1.School of Automation Science and Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China
    2.China-Singapore International Joint Research Institute,South China University of Technology,Guangzhou 510555,Guangdong,China
    3.Key Laboratory of Autonomous Systems and Network Control of the Ministry of Education,South China University of Technology,Guangzhou 510640,Guangdong,China
    4.Research Institute of Modern Industrial Innovation,South China University of Technology,Zhuhai 519170,Guangdong,China
    5.Engineering Center of Guangdong Development and Reform Commission,South China University of Technology,Guangzhou 510031,Guangdong,China
杜启亮(1980—),男,博士,副研究员,主要从事模式识别与机器视觉研究。E-mail: qldu@scut.edu.cn

Received date: 2023-05-10

  Online published: 2023-12-22

Supported by

the Key-Area Research and Development Program of Guangdong Province(2020B1111010002)

摘要

近年来,注意力机制在图像分类、目标检测和语义分割等领域取得了巨大成功,但现有的注意力机制大多只能在通道或空间维度上实现特征融合,这极大限制了其在通道和空间维度上变化的灵活性,导致无法充分利用特征信息。为此,文中提出一种基于特征相似性和特征规范化的、可同时利用特征图各维度信息的卷积神经网络注意力模块FSNAM。该模块由特征相似性模块(FSM)和特征规范化模块(FNM)两部分组成,FSM利用输入特征图的通道特征信息和局部空间特征信息生成一个二维的特征相似性权重图;FNM利用输入特征图的全局空间特征信息生成一个三维的特征规范化权重图;两个模块生成的权重图融合在一起,生成一个三维的注意力权重图,以此实现通道特征信息和空间特征信息的融合。为证明FSNAM的可行性和有效性,进行了消融实验,结果表明:在图像分类任务方面,FSNAM模块对分类网络在CIFAR数据集上的性能提升明显优于其他主流注意力模块;在目标检测任务方面,使用FSNAM模块的目标检测网络对VOC数据集中的小目标和中等大小目标的检测准确率分别提高了3.9和1.2个百分点;在语义分割任务方面,使用FSNAM模块可以提高HRNet模型的性能,在SBD数据集上模型的平均像素准确率提高了0.58个百分点。

本文引用格式

杜启亮 , 汪益民 , 田联房 . 基于特征相似性和特征规范化的注意力模块[J]. 华南理工大学学报(自然科学版), 2024 , 52(7) : 62 -71 . DOI: 10.12141/j.issn.1000-565X.230313

Abstract

In recent years, attention mechanisms have achieved great success in the fields of image classification, object detection and semantic segmentation. However, most existing attention mechanisms can only achieve feature fusion in channel or spatial dimensions, which greatly limits the flexibility of attention mechanisms to change in channel and spatial dimensions and cannot fully utilize feature information. To address this issue, this paper proposes a convolutional neural network attention module based on feature similarity and feature normalization (FSNAM), which can utilize the characteristic information of both channel domain and spatial domain. FSNAM consists of a feature similarity module (FSM) and a feature normalization module (FNM). FSM generates a two-dimension feature similarity weight map using the channel feature information and local spatial feature information of the input feature map, while FNM generates a three-dimension feature normalization weight map using the global spatial feature information of the input feature map. The weight maps generated by FSM and FNM are fused to generate a three-dimension attention weight map to achieve the fusion of channel feature information and spatial feature information. Moreover, to demonstrate the feasibility and effectiveness of FSNAM, ablation experiments are conducted. The results show that, for image classification tasks, FSNAM significantly outperforms other mainstream attention modules in improving the performance of the classification network on CIFAR dataset; for object detection tasks, the object detection network using FSNAM improves the detection accuracy of small and medium-sized objects in VOC dataset by 3.9 and 1.2 points of percentage, respectively; and, for semantic segmentation tasks, FSNAM can significantly improve the performance of HRNet model, and helps to achieve an average pixel accuracy increase of the model on SBD dataset of 0.58 points of percentage.

参考文献

1 DENG J, DONG W, SOCHER R,et al .Imagenet:a large-scale hierarchical image database[C]∥Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition.Miami:IEEE,2009:248-255.
2 LIN T Y, MAIRE M, BELONGIE S,et al .Microsoft COCO:common objects in context[C]∥Proceedings of the 13th European Conference on Computer Vision.Zurich:Springer,2014:740-755.
3 HE K, ZHANG X, REN S,et al .Deep residual learning for image recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:770-778.
4 HUANG G, LIU Z, VAN DER MAATEN L,et al .Densely connected convolutional networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:4700-4708.
5 HU J, SHEN L, SUN G .Squeeze-and-excitation networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:7132-7141.
6 DAI T, CAI J, ZHANG Y,et al .Second-order attention network for single image super-resolution[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE,2019:11065-11074.
7 ZHAO H, KONG X, HE J,et al .Efficient image super-resolution using pixel attention[C]∥Proceedings of the European Conference on Computer Vision.Glasgow:Springer,2020:56-72.
8 MNIH V, HEESS N, GRAVES A .Recurrent models of visual attention[J].Advances in Neural Information Processing Systems20142(12):2204-2212.
9 BA J, MNIH V, KAVUKCUOGLU K .Multiple object recognition with visual attention[EB/OL].(2015-04-23)[2023-05-10]..
10 XU K, BA J, KIROS R,et al .Show,attend and tell:neural image caption generation with visual attention[C]∥Proceedings of the International Conference on Machine Learning.Lille:PMLR,2015:2048-2057.
11 GREGOR K, DANIHELKA I, GRAVES A,et al .DRAW:a recurrent neural network for image generation[C]∥Proceedings of the International Conference on Machine Learning.Lille:PMLR,2015:1462-1471.
12 WOO S, PARK J, LEE J Y,et al .CBAM:convolutional block attention module[C]∥Proceedings of the European Conference on Computer Vision.Munich:Springer,2018:3-19.
13 WANG X, GIRSHICK R, GUPTA A,et al .Non-local neural networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:7794-7803.
14 WANG Q, WU B, ZHU P,et al .ECA-Net:efficient channel attention for deep convolutional neural networks[C]∥Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.Seatle:IEEE,2020:11534-11542.
15 PARK J,WOO S, LEE J Y,et al .BAM:bottleneck attention module[EB/OL].(2018-07-18)[2023-05-10]..
16 VASWANI A, SHAZEER N, PARMAR N,et al .Attention is all you need[J].Advances in Neural Information Processing Systems201731(17):6000-6010.
17 LIU Z, LIN Y, CAO Y,et al .Swin transformer:hie-rarchical vision transformer using shifted windows[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision.Montreal:IEEE,2021:10012-10022.
18 IOFFE S, SZEGEDY C .Batch normalization:acce-lerating deep network training by reducing internal covariate shift[C]∥Proceedings of the International Conference on Machine Learning.Lille:PMLR,2015:448-456.
19 LI X, SUN W, WU T .Attentive normalization[C]∥Proceedings of the European Conference on Computer Vision.Glasgow:Springer,2020:70-87.
20 YAO M, ZHAO G, ZHANG H,et al .Attention spiking neural networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence202345(8):9393-9410.
21 YANG L, ZHANG R Y, LI L,et al .SimAM:a simple,parameter-free attention module for convolutional neural networks[C]∥Proceedings of the International Conference on Machine Learning.Graz:PMLR,2021:11863-11874.
22 WEBB B S, DHRUV N T, SOLOMON S G,et al .Early and late mechanisms of surround suppression in striate cortex of macaque[J].Journal of Neuroscience200525(50):11666-11675.
23 TAN S, ZHANG L, SHU X,et al .A feature-wise attention module based on the difference with surrounding features for convolutional neural networks[J].Frontiers of Computer Science202317(6):338-348.
24 HE K, ZHANG X, REN S,et al .Identity mappings in deep residual networks[C]∥Proceedings of the 14th European Conference on Computer Vision.Amsterdam:Springer,2016:630-645.
25 EVERINGHAM M, ESLAMI S M A, VAN GOOL L,et al .The pascal visual object classes challenge:a retrospective[J].International Journal of Computer Vision2015111(1):98-136.
26 BOCHKOVSKIY A, WANG C Y, LIAO H Y M .Yolov4:optimal speed and accuracy of object detection[EB/OL].(2020-04-23)[2023-05-10]..
27 HARIHARAN B, ARBELAEZ P, BOURDEV L,et al .Semantic contours from inverse detectors[C]∥ Proceedings of the 2011 International Conference on Computer Vision.Barcelona:IEEE,2011:991-998.
28 LOSHCHILOV I, HUTTER F .SGDR:stochastic gradient descent with warm restarts[EB/OL].(2017-03-03)[2023-05-10]..
29 WANG J, SUN K, CHENG T,et al .Deep high-resolution representation learning for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence202143(10):3349-3364.
文章导航

/