Journal of South China University of Technology(Natural Science Edition) ›› 2024, Vol. 52 ›› Issue (6): 138-147.doi: 10.12141/j.issn.1000-565X.230420

• Computer Science & Technology • Previous Articles     Next Articles

Hand Pose Estimation Based on Prior Knowledge and Mesh Supervision

SUN Digang(), ZHANG Ping()   

  1. School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006,Guangdong,China
  • Received:2023-06-20 Online:2024-06-25 Published:2023-12-27
  • Contact: 张平(1964—),男,博士,教授,主要从事智能机器人技术、智能网络制造技术等研究。 E-mail:pzhang@scut.edu.cn
  • About author:孙迪钢(1981—),男,博士生,主要从事深度学习、计算机视觉、智能人机交互技术等研究。E-mail:cssundg@mail.scut.edu.cn
  • Supported by:
    the Key-Area Research and Development Program of Guangdong Province(2019B090915002)

Abstract:

Due to the hand self-occlusion and the lack of depth information, the estimation of 3D hand pose based on monocular RGB images is not accurate enough in estimating relative depth of joints, and the generated hand pose violates the biomechanical constraints of the hand. To solve this problem, by combining the prior knowledge contained in the hand structure and the hand grid information, a deep neural network based on prior knowledge and mesh supervision is proposed. The articulated structure of the hand skeleton implies that there exists a specific relationship between the projections of the 3D hand pose in the 2D image plane and the depth direction, but the differences in hand structure between individuals make it difficult to describe this relationship intuitively and formally. Therefore, this paper proposes to fit it through learning. Specific relationships also exist between joint positions and bone lengths of the same finger, bending directions of different segments of the same finger, and bending directions of different fingers, which are designed as loss functions to supervise network training. The proposed neural network generates hand meshes in parallel with hand poses, supervises the network training through mesh annotation, and optimizes the pose estimation without increasing the network complexity. Furthermore, the neural network is trained using a mixed dataset to further improve its generalization capability. Experimental results show that the proposed method outperforms other methods in terms of internal cross-validation accuracy in multiple datasets, cross-dataset validation accuracy, and time and space complexity of the model. As a result, the prior knowledge of hand skeleton and the mesh supervision improve the accuracy of pose estimation while keeping the neural network compact.

Key words: hand pose estimation, hand shape estimation, prior knowledge, hand mesh

CLC Number: