Electronics, Communication & Automation Technology

Attention Module Based on Feature Similarity and Feature Normalization

  • DU Qiliang ,
  • WANG Yimin ,
  • TIAN Lianfang
Expand
  • 1.School of Automation Science and Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China
    2.China-Singapore International Joint Research Institute,South China University of Technology,Guangzhou 510555,Guangdong,China
    3.Key Laboratory of Autonomous Systems and Network Control of the Ministry of Education,South China University of Technology,Guangzhou 510640,Guangdong,China
    4.Research Institute of Modern Industrial Innovation,South China University of Technology,Zhuhai 519170,Guangdong,China
    5.Engineering Center of Guangdong Development and Reform Commission,South China University of Technology,Guangzhou 510031,Guangdong,China
杜启亮(1980—),男,博士,副研究员,主要从事模式识别与机器视觉研究。E-mail: qldu@scut.edu.cn

Received date: 2023-05-10

  Online published: 2023-12-22

Supported by

the Key-Area Research and Development Program of Guangdong Province(2020B1111010002)

Abstract

In recent years, attention mechanisms have achieved great success in the fields of image classification, object detection and semantic segmentation. However, most existing attention mechanisms can only achieve feature fusion in channel or spatial dimensions, which greatly limits the flexibility of attention mechanisms to change in channel and spatial dimensions and cannot fully utilize feature information. To address this issue, this paper proposes a convolutional neural network attention module based on feature similarity and feature normalization (FSNAM), which can utilize the characteristic information of both channel domain and spatial domain. FSNAM consists of a feature similarity module (FSM) and a feature normalization module (FNM). FSM generates a two-dimension feature similarity weight map using the channel feature information and local spatial feature information of the input feature map, while FNM generates a three-dimension feature normalization weight map using the global spatial feature information of the input feature map. The weight maps generated by FSM and FNM are fused to generate a three-dimension attention weight map to achieve the fusion of channel feature information and spatial feature information. Moreover, to demonstrate the feasibility and effectiveness of FSNAM, ablation experiments are conducted. The results show that, for image classification tasks, FSNAM significantly outperforms other mainstream attention modules in improving the performance of the classification network on CIFAR dataset; for object detection tasks, the object detection network using FSNAM improves the detection accuracy of small and medium-sized objects in VOC dataset by 3.9 and 1.2 points of percentage, respectively; and, for semantic segmentation tasks, FSNAM can significantly improve the performance of HRNet model, and helps to achieve an average pixel accuracy increase of the model on SBD dataset of 0.58 points of percentage.

Cite this article

DU Qiliang , WANG Yimin , TIAN Lianfang . Attention Module Based on Feature Similarity and Feature Normalization[J]. Journal of South China University of Technology(Natural Science), 2024 , 52(7) : 62 -71 . DOI: 10.12141/j.issn.1000-565X.230313

References

1 DENG J, DONG W, SOCHER R,et al .Imagenet:a large-scale hierarchical image database[C]∥Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition.Miami:IEEE,2009:248-255.
2 LIN T Y, MAIRE M, BELONGIE S,et al .Microsoft COCO:common objects in context[C]∥Proceedings of the 13th European Conference on Computer Vision.Zurich:Springer,2014:740-755.
3 HE K, ZHANG X, REN S,et al .Deep residual learning for image recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:770-778.
4 HUANG G, LIU Z, VAN DER MAATEN L,et al .Densely connected convolutional networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:4700-4708.
5 HU J, SHEN L, SUN G .Squeeze-and-excitation networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:7132-7141.
6 DAI T, CAI J, ZHANG Y,et al .Second-order attention network for single image super-resolution[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE,2019:11065-11074.
7 ZHAO H, KONG X, HE J,et al .Efficient image super-resolution using pixel attention[C]∥Proceedings of the European Conference on Computer Vision.Glasgow:Springer,2020:56-72.
8 MNIH V, HEESS N, GRAVES A .Recurrent models of visual attention[J].Advances in Neural Information Processing Systems20142(12):2204-2212.
9 BA J, MNIH V, KAVUKCUOGLU K .Multiple object recognition with visual attention[EB/OL].(2015-04-23)[2023-05-10]..
10 XU K, BA J, KIROS R,et al .Show,attend and tell:neural image caption generation with visual attention[C]∥Proceedings of the International Conference on Machine Learning.Lille:PMLR,2015:2048-2057.
11 GREGOR K, DANIHELKA I, GRAVES A,et al .DRAW:a recurrent neural network for image generation[C]∥Proceedings of the International Conference on Machine Learning.Lille:PMLR,2015:1462-1471.
12 WOO S, PARK J, LEE J Y,et al .CBAM:convolutional block attention module[C]∥Proceedings of the European Conference on Computer Vision.Munich:Springer,2018:3-19.
13 WANG X, GIRSHICK R, GUPTA A,et al .Non-local neural networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:7794-7803.
14 WANG Q, WU B, ZHU P,et al .ECA-Net:efficient channel attention for deep convolutional neural networks[C]∥Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.Seatle:IEEE,2020:11534-11542.
15 PARK J,WOO S, LEE J Y,et al .BAM:bottleneck attention module[EB/OL].(2018-07-18)[2023-05-10]..
16 VASWANI A, SHAZEER N, PARMAR N,et al .Attention is all you need[J].Advances in Neural Information Processing Systems201731(17):6000-6010.
17 LIU Z, LIN Y, CAO Y,et al .Swin transformer:hie-rarchical vision transformer using shifted windows[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision.Montreal:IEEE,2021:10012-10022.
18 IOFFE S, SZEGEDY C .Batch normalization:acce-lerating deep network training by reducing internal covariate shift[C]∥Proceedings of the International Conference on Machine Learning.Lille:PMLR,2015:448-456.
19 LI X, SUN W, WU T .Attentive normalization[C]∥Proceedings of the European Conference on Computer Vision.Glasgow:Springer,2020:70-87.
20 YAO M, ZHAO G, ZHANG H,et al .Attention spiking neural networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence202345(8):9393-9410.
21 YANG L, ZHANG R Y, LI L,et al .SimAM:a simple,parameter-free attention module for convolutional neural networks[C]∥Proceedings of the International Conference on Machine Learning.Graz:PMLR,2021:11863-11874.
22 WEBB B S, DHRUV N T, SOLOMON S G,et al .Early and late mechanisms of surround suppression in striate cortex of macaque[J].Journal of Neuroscience200525(50):11666-11675.
23 TAN S, ZHANG L, SHU X,et al .A feature-wise attention module based on the difference with surrounding features for convolutional neural networks[J].Frontiers of Computer Science202317(6):338-348.
24 HE K, ZHANG X, REN S,et al .Identity mappings in deep residual networks[C]∥Proceedings of the 14th European Conference on Computer Vision.Amsterdam:Springer,2016:630-645.
25 EVERINGHAM M, ESLAMI S M A, VAN GOOL L,et al .The pascal visual object classes challenge:a retrospective[J].International Journal of Computer Vision2015111(1):98-136.
26 BOCHKOVSKIY A, WANG C Y, LIAO H Y M .Yolov4:optimal speed and accuracy of object detection[EB/OL].(2020-04-23)[2023-05-10]..
27 HARIHARAN B, ARBELAEZ P, BOURDEV L,et al .Semantic contours from inverse detectors[C]∥ Proceedings of the 2011 International Conference on Computer Vision.Barcelona:IEEE,2011:991-998.
28 LOSHCHILOV I, HUTTER F .SGDR:stochastic gradient descent with warm restarts[EB/OL].(2017-03-03)[2023-05-10]..
29 WANG J, SUN K, CHENG T,et al .Deep high-resolution representation learning for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence202143(10):3349-3364.
Outlines

/