华南理工大学学报(自然科学版) ›› 2020, Vol. 48 ›› Issue (1): 139-146.doi: 10.12141/j.issn.1000-565X.190287

• 电子、通信与自动控制 • 上一篇    

基于特定韵母发音事件分析的语音唇动一致性判决方法

铮宇1,2 邱华愉2 杨春玲1 王泳2†   

  1. 1. 华南理工大学 电子与信息学院,广东 广州 510640; 2. 广东技术师范大学 电子与信息学院,广东 广州 510665
  • 收稿日期:2019-05-16 修回日期:2019-07-03 出版日期:2020-01-25 发布日期:2019-12-01
  • 通信作者: 王泳(1976-),男,博士后,副教授,主要从事语音信号处理、信息隐藏研究。 E-mail:isswy@mail.sysu.edu.cn
  • 作者简介:朱铮宇(1984-),男,博士后,讲师,主要从事音视频多模态信号处理研究。E-mail: zhuzhengyu0701@163. com
  • 基金资助:
    国家自然科学基金资助项目 (61672173); 广东省普通高校青年创新人才类项目 (2018KQNCX140); 广东省普通高校特色创新项目 (2015KTSCX083)

Lip Motion and Voice Consistency Recognition based on Specific Vowel Pronunciation Events Analysis

ZHU Zhengyu1,2 QIU Huayu2 YANG Chunling1 WANG Yong2#br#   

  1. 1. School of Electronic and Information Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China; 2. School of Electronics and Information,Guangdong Polytechnic Normal University,Guangzhou 510665,Guangdong,China
  • Received:2019-05-16 Revised:2019-07-03 Online:2020-01-25 Published:2019-12-01
  • Contact: 王泳(1976-),男,博士后,副教授,主要从事语音信号处理、信息隐藏研究。 E-mail:isswy@mail.sysu.edu.cn
  • About author:朱铮宇(1984-),男,博士后,讲师,主要从事音视频多模态信号处理研究。E-mail: zhuzhengyu0701@163. com
  • Supported by:
    Supported by the National Natural Science Foundation of China (61672173)

摘要: 针对现有一致性判决方法主要对整句 (段) 话进行分析,并无对分析内容加以筛选,存在运算繁琐及结果易受静音等弱关联片段影响等不足,以唇型变化显著的韵母发音单元为研究重心,通过分析聚类后各韵母类别的音唇关联度,选出更具代表性的特定韵母单元并结合位置时延分析,提出了基于特定韵母发音事件分析的音唇一致性判决方法。该方法先分割并识别出特定韵母单元; 然后求出以上各韵母发音事件的音唇相关度,并对特定韵母出现位置的时延分布进行分析; 最后融合特定韵母事件的音唇相关度得分与位置时延分析评分进行一致性判决。通过实验对该方法与其他方法进行了对比,结果表明,该算法在识别性能上优于多种整句分析的比较算法,同时也相应降低了运算量。

关键词: 音唇一致性判决方法, 韵母发音事件, 音唇相关度, 韵母分割

Abstract: The traditional lip motion and voice consistency recognition method is to analyze the whole sentence without filtering the content,which is complicate in computation and its results are vulnerable to weak related segments such as mute. The vowels which with significant lip shape changes were researched in depth. By analyzing the audio and lip motion correlation of each vowel category clustered by lip sequence features,a more representative specific phonological pronunciation unit was selected as the analysis object. Combined with audio-visual delay analysis,a consistent recognition method based on specific vowel pronunciation events analysis was proposed.Firstly,the selected unit was segmented and identified. Then the correlation degree of each specific vowel event was obtained,and the delay distribution of each specific vowel occurrence position was statistically scored. Finally,a consistency judgment was made by combining the vowel pronunciation event audio-visual correlation score with the position delay analysis score. Compared with other methods through experiments,results show that the proposed method is superior in recognition performance and reduces the amount of computation.

Key words: lip motion and voice consistency recognition method, vowel pronunciation events, correlation of lip motion and voice consistency, vowel segmentation

中图分类号: