华南理工大学学报(自然科学版) ›› 2022, Vol. 50 ›› Issue (12): 20-29.doi: 10.12141/j.issn.1000-565X.220055

所属专题: 2022年计算机科学与技术

• 计算机科学与技术 • 上一篇    下一篇

用于动作识别的双流自适应注意力图卷积网络

杜启亮1,2,3 向照夷1 田联房1,2,4 余陆斌1   

  1. 1.华南理工大学 自动化科学与工程学院,广东 广州 510640
    2.华南理工大学 中新国际联合研究院,广东 广州 510555
    3.华南理工大学 自主系统与网络控制教育部重点实验室,广东 广州 510640
    4.华南理工大学 珠海现代产业创新研究院,广东 珠海 519170
  • 收稿日期:2022-02-11 出版日期:2022-12-25 发布日期:2022-04-08
  • 通信作者: 杜启亮(1980-),男,博士,副研究员,主要从事模式识别与机器视觉研究。 E-mail:qldu@scut.edu.cn
  • 作者简介:杜启亮(1980-),男,博士,副研究员,主要从事模式识别与机器视觉研究。
  • 基金资助:
    广东省海洋经济发展专项(GDNRC[2020]018);广东省重点领域研发计划项目(2019B020214001);广州市产业技术重大攻关计划项目(2019-01-01-12-1006-0001);华南理工大学中央高校基本科研业务费专项资金资助项目(2018KZ05);华南理工大学研究生教育改革项目(zysk2018005)

Two-Stream Adaptive Attention Graph Convolutional Networks for Action Recognition

DU Qiliang1,2,XIANG Zhaoyi1 TIAN Lianfang1,2,YU Lubin1   

  1. 1.School of Automation Science and Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China
    2.China-Singapore International Joint Research Institute,South China University of Technology,Guangzhou 510555,Guangdong,China
    3.Key Laboratory of Autonomous Systems and Network Control of the Ministry of Education,South China University of Technology,Guangzhou 510640,Guangdong,China
    4.Research Institute of Modern Industrial Innovation,South China University of Technology,Zhuhai 519170,Guangdong,China
  • Received:2022-02-11 Online:2022-12-25 Published:2022-04-08
  • Contact: 杜启亮(1980-),男,博士,副研究员,主要从事模式识别与机器视觉研究。 E-mail:qldu@scut.edu.cn
  • About author:杜启亮(1980-),男,博士,副研究员,主要从事模式识别与机器视觉研究。
  • Supported by:
    the Guangdong Provincial Special Project for the Development of Ocean Economy(GDNRC[2020]018);the Key-Area R&D Project of Guangdong Province(2019B020214001)

摘要:

人体动作识别因在公共安全方面具有重要的作用而在计算机视觉领域备受关注。然而,现有的图卷积网络在融合多尺度节点的邻域特征时,通常采用各阶邻接矩阵直接相加的方法,各项重要性一致,难以聚焦于重要特征,不利于最优节点关系的建立,同时采用对不同模型的预测结果求平均的双流融合方法,忽略了潜在数据的分布差异,融合效果欠佳。为此,文中提出了一种双流自适应注意力图卷积网络,用于对人体动作进行识别。首先,设计了能自适应平衡权重的多阶邻接矩阵,使模型聚焦于更加重要的邻域;然后,设计了多尺度的时空自注意力模块及通道注意力模块,以增强模型的特征提取能力;最后,提出了一种双流融合网络,利用双流预测结果的数据分布来决定融合系数,提高融合效果。该算法在NTU RGB+D的跨主体和跨视角两个子数据集上的识别准确率分别达92.3%和97.5%,在Kinetics-Skeleton数据集上的识别准确率达39.8%,均高于已有算法,表明了文中算法对于人体动作识别的优越性。

关键词: 动作识别, 图卷积网络, 邻接矩阵, 注意力, 双流融合

Abstract:

Human action recognition has received much attention in the field of computer vision because of its important role in public safety. However, when fusing the neighborhood features of multi-scale nodes, existing graph convolutional networks usually adopt a direct summation method, in which the same importance is attached to each feature, so it is difficult to focus on important features and is not conducive to the establishment of optimal nodal relationships. In addition, the two-stream fusion method, which averages the prediction results of different models, ignores the potential data distribution differences and the fusion effect is not good. To this end, this paper proposed a two-stream adaptive attention graph convolutional network for human action recognition. Firstly, a multi-order adjacency matrix that adaptively balances the weights was designed to focus the model on more important domains. Secondly, a multi-scale spatio-temporal self-attention module and a channel attention module were designed to enhance the feature extraction capability of the model. Finally, a two-stream fusion network was proposed to improve the fusion effect by using the data distribution of the two-stream prediction results to determine the fusion coefficients. On the two subdatasets of cross subject and cross view of NTU RGB+D, the recognition accuracy of the algorithm is 92.3% and 97.5%, respectively; while on the Kinetics-Skeleton dataset, it reaches 39.8%, both of which are higher than the existing algorithms, indicating the superiority of the algorithm in human motion recognition.

Key words: action recognition, graph neural network, adjacency matrix, attention, two-stream fusion

中图分类号: