Journal of South China University of Technology (Natural Science Edition) ›› 2020, Vol. 48 ›› Issue (1): 139-146.doi: 10.12141/j.issn.1000-565X.190287

• Electronics, Communication & Automation Technology • Previous Articles    

Lip Motion and Voice Consistency Recognition based on Specific Vowel Pronunciation Events Analysis

ZHU Zhengyu1,2 QIU Huayu2 YANG Chunling1 WANG Yong2#br#   

  1. 1. School of Electronic and Information Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China; 2. School of Electronics and Information,Guangdong Polytechnic Normal University,Guangzhou 510665,Guangdong,China
  • Received:2019-05-16 Revised:2019-07-03 Online:2020-01-25 Published:2019-12-01
  • Contact: 王泳(1976-),男,博士后,副教授,主要从事语音信号处理、信息隐藏研究。 E-mail:isswy@mail.sysu.edu.cn
  • About author:朱铮宇(1984-),男,博士后,讲师,主要从事音视频多模态信号处理研究。E-mail: zhuzhengyu0701@163. com
  • Supported by:
    Supported by the National Natural Science Foundation of China (61672173)

Abstract: The traditional lip motion and voice consistency recognition method is to analyze the whole sentence without filtering the content,which is complicate in computation and its results are vulnerable to weak related segments such as mute. The vowels which with significant lip shape changes were researched in depth. By analyzing the audio and lip motion correlation of each vowel category clustered by lip sequence features,a more representative specific phonological pronunciation unit was selected as the analysis object. Combined with audio-visual delay analysis,a consistent recognition method based on specific vowel pronunciation events analysis was proposed.Firstly,the selected unit was segmented and identified. Then the correlation degree of each specific vowel event was obtained,and the delay distribution of each specific vowel occurrence position was statistically scored. Finally,a consistency judgment was made by combining the vowel pronunciation event audio-visual correlation score with the position delay analysis score. Compared with other methods through experiments,results show that the proposed method is superior in recognition performance and reduces the amount of computation.

Key words: lip motion and voice consistency recognition method, vowel pronunciation events, correlation of lip motion and voice consistency, vowel segmentation

CLC Number: