Journal of South China University of Technology(Natural Science Edition) ›› 2024, Vol. 52 ›› Issue (6): 138-147.doi: 10.12141/j.issn.1000-565X.230420

Special Issue: 2024年计算机科学与技术

• Computer Science & Technology • Previous Articles     Next Articles

Hand Pose Estimation Based on Prior Knowledge and Mesh Supervision

SUN Digang(), ZHANG Ping()   

  1. School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006,Guangdong,China
  • Received:2023-06-20 Online:2024-06-10 Published:2023-12-27
  • Contact: ZHANG Ping E-mail:cssundg@mail.scut.edu.cn;pzhang@scut.edu.cn
  • Supported by:
    the Key-Area Research and Development Program of Guangdong Province(2019B090915002)

Abstract:

Due to the hand self-occlusion and the lack of depth information, the estimation of 3D hand pose based on monocular RGB images is not accurate enough in estimating relative depth of joints, and the generated hand pose violates the biomechanical constraints of the hand. To solve this problem, by combining the prior knowledge contained in the hand structure and the hand grid information, a deep neural network based on prior knowledge and mesh supervision is proposed. The articulated structure of the hand skeleton implies that there exists a specific relationship between the projections of the 3D hand pose in the 2D image plane and the depth direction, but the differences in hand structure between individuals make it difficult to describe this relationship intuitively and formally. Therefore, this paper proposes to fit it through learning. Specific relationships also exist between joint positions and bone lengths of the same finger, bending directions of different segments of the same finger, and bending directions of different fingers, which are designed as loss functions to supervise network training. The proposed neural network generates hand meshes in parallel with hand poses, supervises the network training through mesh annotation, and optimizes the pose estimation without increasing the network complexity. Furthermore, the neural network is trained using a mixed dataset to further improve its generalization capability. Experimental results show that the proposed method outperforms other methods in terms of internal cross-validation accuracy in multiple datasets, cross-dataset validation accuracy, and time and space complexity of the model. As a result, the prior knowledge of hand skeleton and the mesh supervision improve the accuracy of pose estimation while keeping the neural network compact.

Key words: hand pose estimation, hand shape estimation, prior knowledge, hand mesh

CLC Number: