Journal of South China University of Technology(Natural Science Edition) ›› 2022, Vol. 50 ›› Issue (12): 20-29.doi: 10.12141/j.issn.1000-565X.220055

Special Issue: 2022年计算机科学与技术

• Computer Science & Technology • Previous Articles     Next Articles

Two-Stream Adaptive Attention Graph Convolutional Networks for Action Recognition

DU Qiliang1,2,XIANG Zhaoyi1 TIAN Lianfang1,2,YU Lubin1   

  1. 1.School of Automation Science and Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China
    2.China-Singapore International Joint Research Institute,South China University of Technology,Guangzhou 510555,Guangdong,China
    3.Key Laboratory of Autonomous Systems and Network Control of the Ministry of Education,South China University of Technology,Guangzhou 510640,Guangdong,China
    4.Research Institute of Modern Industrial Innovation,South China University of Technology,Zhuhai 519170,Guangdong,China
  • Received:2022-02-11 Online:2022-12-25 Published:2022-04-08
  • Contact: 杜启亮(1980-),男,博士,副研究员,主要从事模式识别与机器视觉研究。 E-mail:qldu@scut.edu.cn
  • About author:杜启亮(1980-),男,博士,副研究员,主要从事模式识别与机器视觉研究。
  • Supported by:
    the Guangdong Provincial Special Project for the Development of Ocean Economy(GDNRC[2020]018);the Key-Area R&D Project of Guangdong Province(2019B020214001)

Abstract:

Human action recognition has received much attention in the field of computer vision because of its important role in public safety. However, when fusing the neighborhood features of multi-scale nodes, existing graph convolutional networks usually adopt a direct summation method, in which the same importance is attached to each feature, so it is difficult to focus on important features and is not conducive to the establishment of optimal nodal relationships. In addition, the two-stream fusion method, which averages the prediction results of different models, ignores the potential data distribution differences and the fusion effect is not good. To this end, this paper proposed a two-stream adaptive attention graph convolutional network for human action recognition. Firstly, a multi-order adjacency matrix that adaptively balances the weights was designed to focus the model on more important domains. Secondly, a multi-scale spatio-temporal self-attention module and a channel attention module were designed to enhance the feature extraction capability of the model. Finally, a two-stream fusion network was proposed to improve the fusion effect by using the data distribution of the two-stream prediction results to determine the fusion coefficients. On the two subdatasets of cross subject and cross view of NTU RGB+D, the recognition accuracy of the algorithm is 92.3% and 97.5%, respectively; while on the Kinetics-Skeleton dataset, it reaches 39.8%, both of which are higher than the existing algorithms, indicating the superiority of the algorithm in human motion recognition.

Key words: action recognition, graph neural network, adjacency matrix, attention, two-stream fusion

CLC Number: