基于两步判决的口语中非文字音频事件检测方法

doi:10.3969/j.issn.1000-565X.2011.02.004

华南理工大学学报（自然科学版） ›› 2011, Vol. 39 ›› Issue (2): 20-25,31.doi: 10.3969/j.issn.1000-565X.2011.02.004

• 电子、通信与自动控制 • 上一篇下一篇

基于两步判决的口语中非文字音频事件检测方法

贺前华李艳雄李韬张虹杨继臣

华南理工大学电子与信息学院，广东广州51064

收稿日期:2010-04-16 修回日期:2010-06-17 出版日期:2011-02-25 发布日期:2011-01-02
通信作者: 贺前华(1965-)，男，教授，博士生导师，主要从事语音及音频信号处理、嵌入式系统研究 E-mail:eeqhhe@scut.edu.cn
作者简介:贺前华(1965-)，男，教授，博士生导师，主要从事语音及音频信号处理、嵌入式系统研究
基金资助:
国家自然科学基金资助项目(60972132)；广东省自然科学基金资助项目(10451064101004651，9351064101000003)

Two-Stage Decision-Based Detection of Non-Lexical Audio Events in Spontaneous Vocalization

He Qian-hua Li Yan-xiong Li Tao Zhang Hong Yang Ji-chen

South China university of technology, electronic and information institute, guangdong guangzhou 51064

Received:2010-04-16 Revised:2010-06-17 Online:2011-02-25 Published:2011-01-02
Contact: 贺前华(1965-)，男，教授，博士生导师，主要从事语音及音频信号处理、嵌入式系统研究 E-mail:eeqhhe@scut.edu.cn
About author:贺前华(1965-)，男，教授，博士生导师，主要从事语音及音频信号处理、嵌入式系统研究
Supported by:
国家自然科学基金资助项目(60972132)；广东省自然科学基金资助项目(10451064101004651，9351064101000003)

摘要/Abstract

摘要： 为了有效利用非文字音频事件进行会话语音的语义分析,在分析口语中频繁出现的音频事件特征差异的基础上,提出了一种基于两步判决的口语中非文字音频事件的检测方法.该方法利用音频事件的信号特征构造音频事件信号段,采用门限判决来检测长掌声（第一步判决）,而用统计模型来检测其它音频事件（第二步判决）.实验结果表明：该方法检测填音、笑声、掌声3种非文字音频事件的平均准确率、召回率和F1度量值分别为87.3%、93.8%和90.4%;与现有文献数据相比,F1度量值平均提高了7.5%,且文中方法能更精确地确定非文字音频事件的边界.

关键词: 非文字音频事件, 门限判决, 统计模型检测, 口语语音, 语音处理

Abstract:

In order to effectively utilize non-lexical audio events to analyze the semantics of conversational speech,the characteristic differences among the audio events frequently occurring in spontaneous vocalization are analyzed,and a two-stage decision-based method to detect non-lexical audio events in spontaneous vocalization is proposed.In this method,the characteristics of audio events are used to construct signal segments of audio events： the thre-shold decision is used to detect longer applause（the first-stage decision）,and statistical models are employed to detect other audio events（the second-stage decision）.Experimental results show that the average precision,the recall rate and the F1-measure of the proposed method for three non-lexical audio events（i.e.filled pause,laughter and applause） are respectively 87.3%,93.8% and 90.4%;and that,as compared with the existing method,the proposed method is of an average F1-measure increase by 7.5% and it helps to determine the boundaries of non-lexical audio events with higher accuracy.

Key words: Non-lexical events, threshold decision, statistical model detection, spontaneous speech, speech processing

贺前华李艳雄李韬张虹杨继臣. 基于两步判决的口语中非文字音频事件检测方法[J]. 华南理工大学学报（自然科学版）, 2011, 39(2): 20-25,31.

He Qian-hua Li Yan-xiong Li Tao Zhang Hong Yang Ji-chen. Two-Stage Decision-Based Detection of Non-Lexical Audio Events in Spontaneous Vocalization[J]. Journal of South China University of Technology (Natural Science Edition), 2011, 39(2): 20-25,31.

基于两步判决的口语中非文字音频事件检测方法

Two-Stage Decision-Based Detection of Non-Lexical Audio Events in Spontaneous Vocalization

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

编辑推荐

Metrics

本文评价