Journal of South China University of Technology(Natural Science) >
Attention Module Based on Feature Similarity and Feature Normalization
Received date: 2023-05-10
Online published: 2023-12-22
Supported by
the Key-Area Research and Development Program of Guangdong Province(2020B1111010002)
In recent years, attention mechanisms have achieved great success in the fields of image classification, object detection and semantic segmentation. However, most existing attention mechanisms can only achieve feature fusion in channel or spatial dimensions, which greatly limits the flexibility of attention mechanisms to change in channel and spatial dimensions and cannot fully utilize feature information. To address this issue, this paper proposes a convolutional neural network attention module based on feature similarity and feature normalization (FSNAM), which can utilize the characteristic information of both channel domain and spatial domain. FSNAM consists of a feature similarity module (FSM) and a feature normalization module (FNM). FSM generates a two-dimension feature similarity weight map using the channel feature information and local spatial feature information of the input feature map, while FNM generates a three-dimension feature normalization weight map using the global spatial feature information of the input feature map. The weight maps generated by FSM and FNM are fused to generate a three-dimension attention weight map to achieve the fusion of channel feature information and spatial feature information. Moreover, to demonstrate the feasibility and effectiveness of FSNAM, ablation experiments are conducted. The results show that, for image classification tasks, FSNAM significantly outperforms other mainstream attention modules in improving the performance of the classification network on CIFAR dataset; for object detection tasks, the object detection network using FSNAM improves the detection accuracy of small and medium-sized objects in VOC dataset by 3.9 and 1.2 points of percentage, respectively; and, for semantic segmentation tasks, FSNAM can significantly improve the performance of HRNet model, and helps to achieve an average pixel accuracy increase of the model on SBD dataset of 0.58 points of percentage.
DU Qiliang , WANG Yimin , TIAN Lianfang . Attention Module Based on Feature Similarity and Feature Normalization[J]. Journal of South China University of Technology(Natural Science), 2024 , 52(7) : 62 -71 . DOI: 10.12141/j.issn.1000-565X.230313
| 1 | DENG J, DONG W, SOCHER R,et al .Imagenet:a large-scale hierarchical image database[C]∥Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition.Miami:IEEE,2009:248-255. |
| 2 | LIN T Y, MAIRE M, BELONGIE S,et al .Microsoft COCO:common objects in context[C]∥Proceedings of the 13th European Conference on Computer Vision.Zurich:Springer,2014:740-755. |
| 3 | HE K, ZHANG X, REN S,et al .Deep residual learning for image recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:770-778. |
| 4 | HUANG G, LIU Z, VAN DER MAATEN L,et al .Densely connected convolutional networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:4700-4708. |
| 5 | HU J, SHEN L, SUN G .Squeeze-and-excitation networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:7132-7141. |
| 6 | DAI T, CAI J, ZHANG Y,et al .Second-order attention network for single image super-resolution[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE,2019:11065-11074. |
| 7 | ZHAO H, KONG X, HE J,et al .Efficient image super-resolution using pixel attention[C]∥Proceedings of the European Conference on Computer Vision.Glasgow:Springer,2020:56-72. |
| 8 | MNIH V, HEESS N, GRAVES A .Recurrent models of visual attention[J].Advances in Neural Information Processing Systems,2014,2(12):2204-2212. |
| 9 | BA J, MNIH V, KAVUKCUOGLU K .Multiple object recognition with visual attention[EB/OL].(2015-04-23)[2023-05-10].. |
| 10 | XU K, BA J, KIROS R,et al .Show,attend and tell:neural image caption generation with visual attention[C]∥Proceedings of the International Conference on Machine Learning.Lille:PMLR,2015:2048-2057. |
| 11 | GREGOR K, DANIHELKA I, GRAVES A,et al .DRAW:a recurrent neural network for image generation[C]∥Proceedings of the International Conference on Machine Learning.Lille:PMLR,2015:1462-1471. |
| 12 | WOO S, PARK J, LEE J Y,et al .CBAM:convolutional block attention module[C]∥Proceedings of the European Conference on Computer Vision.Munich:Springer,2018:3-19. |
| 13 | WANG X, GIRSHICK R, GUPTA A,et al .Non-local neural networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:7794-7803. |
| 14 | WANG Q, WU B, ZHU P,et al .ECA-Net:efficient channel attention for deep convolutional neural networks[C]∥Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.Seatle:IEEE,2020:11534-11542. |
| 15 | PARK J,WOO S, LEE J Y,et al .BAM:bottleneck attention module[EB/OL].(2018-07-18)[2023-05-10].. |
| 16 | VASWANI A, SHAZEER N, PARMAR N,et al .Attention is all you need[J].Advances in Neural Information Processing Systems,2017,31(17):6000-6010. |
| 17 | LIU Z, LIN Y, CAO Y,et al .Swin transformer:hie-rarchical vision transformer using shifted windows[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision.Montreal:IEEE,2021:10012-10022. |
| 18 | IOFFE S, SZEGEDY C .Batch normalization:acce-lerating deep network training by reducing internal covariate shift[C]∥Proceedings of the International Conference on Machine Learning.Lille:PMLR,2015:448-456. |
| 19 | LI X, SUN W, WU T .Attentive normalization[C]∥Proceedings of the European Conference on Computer Vision.Glasgow:Springer,2020:70-87. |
| 20 | YAO M, ZHAO G, ZHANG H,et al .Attention spiking neural networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(8):9393-9410. |
| 21 | YANG L, ZHANG R Y, LI L,et al .SimAM:a simple,parameter-free attention module for convolutional neural networks[C]∥Proceedings of the International Conference on Machine Learning.Graz:PMLR,2021:11863-11874. |
| 22 | WEBB B S, DHRUV N T, SOLOMON S G,et al .Early and late mechanisms of surround suppression in striate cortex of macaque[J].Journal of Neuroscience,2005,25(50):11666-11675. |
| 23 | TAN S, ZHANG L, SHU X,et al .A feature-wise attention module based on the difference with surrounding features for convolutional neural networks[J].Frontiers of Computer Science,2023,17(6):338-348. |
| 24 | HE K, ZHANG X, REN S,et al .Identity mappings in deep residual networks[C]∥Proceedings of the 14th European Conference on Computer Vision.Amsterdam:Springer,2016:630-645. |
| 25 | EVERINGHAM M, ESLAMI S M A, VAN GOOL L,et al .The pascal visual object classes challenge:a retrospective[J].International Journal of Computer Vision,2015,111(1):98-136. |
| 26 | BOCHKOVSKIY A, WANG C Y, LIAO H Y M .Yolov4:optimal speed and accuracy of object detection[EB/OL].(2020-04-23)[2023-05-10].. |
| 27 | HARIHARAN B, ARBELAEZ P, BOURDEV L,et al .Semantic contours from inverse detectors[C]∥ Proceedings of the 2011 International Conference on Computer Vision.Barcelona:IEEE,2011:991-998. |
| 28 | LOSHCHILOV I, HUTTER F .SGDR:stochastic gradient descent with warm restarts[EB/OL].(2017-03-03)[2023-05-10].. |
| 29 | WANG J, SUN K, CHENG T,et al .Deep high-resolution representation learning for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,43(10):3349-3364. |
/
| 〈 |
|
〉 |