基于行业词表的自动语音转写后优化技术

doi:10.12141/j.issn.1000-565X.220740

华南理工大学学报(自然科学版) ›› 2023, Vol. 51 ›› Issue (8): 118-125.doi: 10.12141/j.issn.1000-565X.220740

所属专题： 2023年电子、通信与自动控制

• 电子、通信与自动控制 • 上一篇下一篇

基于行业词表的自动语音转写后优化技术

马晓亮¹^,²^,³ 安玲玲¹ 邓从健¹^,³^,⁴ 杜德泉²^,³ 张国新⁵

^1.西安电子科技大学广州研究院, 广东广州 510555
^2.中国电信股份有限公司广州分公司, 广东广州 510620
^3.马晓亮劳模和创新工匠工作室, 广东广州 510620
^4.广州云趣信息科技有限公司, 广东广州 510665
^5.中国电信股份有限公司广东分公司, 广东广州 510080

收稿日期:2022-11-10 出版日期:2023-08-25 发布日期:2023-03-01
通信作者: 马晓亮（1973-），男，博士生，高级工程师，华南理工大学工商管理学院讲席教授，主要从事AI、NLP、方言处理、运营商客服运营、数据安全保护等研究。 E-mail:maxiaol.gd@chinatelecom.cn
作者简介:马晓亮（1973-），男，博士生，高级工程师，华南理工大学工商管理学院讲席教授，主要从事AI、NLP、方言处理、运营商客服运营、数据安全保护等研究。
基金资助:
国家重点研发计划项目(2022YFB3102700);国家自然科学基金重点资助项目(62132013)

Translation Optimization Technology of Automatic Speech Recognition Based on Industry-Specific Vocabulary

MA Xiaoliang¹^,²^,³ AN Lingling¹ DENG Congjian¹^,³^,⁴ DU Dequan²^,³ ZHANG Guoxin⁵

^1.Guangzhou Institute of Technology，Xidian University，Guangzhou 510555，Guangdong，China
^2.Guangzhou Branch of China Telecom Co. ，Ltd. ，Guangzhou 510620，Guangdong，China
^3.Ma Xiaoliang’s Model Worker and Innovative Craftsman Workshop，Guangzhou 510620，Guangdong，China
^4.Guangzhou Yunqu Information Technology Co. ，Ltd. ，Guangzhou 510665，Guangdong，China
^5.Guangdong Branch of China Telecom Co. ，Ltd. ，Guangzhou 510080，Guangdong，China

Received:2022-11-10 Online:2023-08-25 Published:2023-03-01
Contact: 马晓亮（1973-），男，博士生，高级工程师，华南理工大学工商管理学院讲席教授，主要从事AI、NLP、方言处理、运营商客服运营、数据安全保护等研究。 E-mail:maxiaol.gd@chinatelecom.cn
About author:马晓亮（1973-），男，博士生，高级工程师，华南理工大学工商管理学院讲席教授，主要从事AI、NLP、方言处理、运营商客服运营、数据安全保护等研究。
Supported by:
the National Key Research and Development Program of China(2022YFB3102700);the National Natural Science Foundation of China(62132013)

摘要/Abstract

摘要：

自动语音识别（ASR）技术目前已发展得较为成熟，通用ASR引擎已经广泛应用于交通、医疗、通信等行业。但是，由于行业专有词汇在大规模训练语料库中呈非独立同态分布，通用ASR引擎在各细分行业转写时存在对行业专有词汇识别准确率低的问题。相较于互联网环境的16 kHz音频采样率，电话呼叫中心语音为窄带低采样（采样率8 kHz），转写后精度下降尤为明显。为了提高行业词汇的语音转写准确率，文中提出一种基于行业词表的ASR转写后优化技术。首先，对语料库文本数据分别采用卷积神经网络模型和深度神经网络BERT模型进行预测分词，生成行业纠错词表。随后，在生产环境中，使用通用ASR引擎对电话呼叫语音数据进行初始转写。然后，对一次转写后的文本，通过Soft-Masked BERT模型结合纠错词表实现文本数据的纠错，从而提高语音识别准确率。使用广州12345热线客服通话语音数据进行训练和测试，结果表明，使用文中的转写后优化技术可以将通用ASR引擎的行业用词转写准确率提高约10个百分点，且纠错速度较快，具有良好的适用性。

关键词: 文本纠错, 语音识别, 客服通话, 行业纠错词表, 卷积神经网络

Abstract:

Automatic speech recognition (ASR) technology has been developed relatively mature, and general ASR engines have been widely used in transportation, medical, communication and other industries. However, due to non-independent homology of industry-specific vocabulary in the large-scale training corpus, there comes to low recognition accuracy of industry-specific vocabulary when the general ASR engines are applied to various subdivisions of industries. As compared with 16 kHz audio sampling rate in Internet environment, narrowband low sampling (8 kHz) of call center may result in more significant decrease of recognition accuracy of ASR. In order to improve the accuracy of speech recognition of industry-specific words, this paper proposes a translation optimization technology of ASR based on industry-specific vocabulary. Specifically, first, convolutional neural network model and deep neural network BERT model are used to predict word for corpus text data, and an industry-specific error correction vocabulary is generated. Next, in the production environment, a general ASR engine is used to perform initial transcription of telephone call voice data. Then, the transcribed text is corrected by using the Soft-Masked BERT model combined with the industry-specific error correction vocabulary, thus improving the accuracy of speech recognition. Finally, by using 12345 hotline customer service call voice data for modeling and testing, the proposed translation optimization technology is proved efficient in improving the accuracy of general ASR recognition by 10 percentage points with high error correction speed and good applicability.

Key words: text error correction, speech recognition, customer service calls, industry-specific vocabulary, convolutional neural network

中图分类号:

TP391.1

马晓亮, 安玲玲, 邓从健, 等. 基于行业词表的自动语音转写后优化技术[J]. 华南理工大学学报(自然科学版), 2023, 51(8): 118-125.

MA Xiaoliang, AN Lingling, DENG Congjian, et al. Translation Optimization Technology of Automatic Speech Recognition Based on Industry-Specific Vocabulary[J]. Journal of South China University of Technology(Natural Science Edition), 2023, 51(8): 118-125.

图/表 10

图1

图2

图3

表1

表2

表3

表4

表5

表6

表7

参考文献 24

1	蒋竺芳．端到端自动语音识别技术研究［D］．北京：北京邮电大学，2019．
2	王琦．呼叫中心技术及其发展［J］．中国数据通信，2004（1）：50-53．
	WANG Qi ．Technology and development of call center［J］．China New Telecommunications，2004（1）：50-53．
3	王宏芳．智能语音客服系统在呼叫中心领域的应用及展望［J］．通信企业管理，2017（6）：57-59．
	WANG Hongfang ．Application and prospect of intelligent customer service system in the field of call center［J］．C-Enterprise Management，2017（6）：57-59．
4	DAVIS K ．Automatic recognition of spoken digits［J］．The Journal of the Acoustical Society of America，1952，24（6）：637．
5	YOUNG S， EVERMANN G， GALES M，et al ．The HTK book［EB/OL］．（2015-12-10）［2022-11-01］．．
6	MADHAB P ．Multilingual conversational telephony speech corpus creation for real world speaker diarization and recognition［C］∥Proceedings of the 2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques （O-COCOSDA）．Bali：IEEE，2016：177-182．
7	胡登峰，黄紫微，冯楠，等．关键核心技术突破与国产替代路径及机制——科大讯飞智能语音技术纵向案例研究［J］．管理世界，2022，38（5）：188-209．
	HU Dengfeng， HUANG Ziwei， FENG Nan，et al ．Path and mechanism of core technology breakthrough and domestic substitution：a longitudinal case study of IFLY TEK intelligent speech technology［J］．Journal of Management World，2022，38（5）：188-209．
8	杜灵君，武晓岛．语音识别技术全球专利布局趋势［J］．科技中国，2021（12）：51-55．
	DU Lingjun， WU Xiaodao ．The global distribution trend of speech recognition patents［J］．Scitech in China，2021（12）：51-55．
9	曹冬玉，陶传奇，郭虹静，等．用户评论驱动的语音测试数据生成方法［J］．小型微型计算机系统，2023，44（7）：1382-1390．
	CAO Dong-yu， TAO Chuan-qi， GUO Hong-jing，et al ．Yest speech generation driven by user reviews［J］．Journal of Chinese Computer Systems，2023，44（7）：1382-1390．
10	孙杰贤．智能客服成为企业数字化转型突破口［J］．中国信息化，2022（2）：35．
	SUN Jiexian ．Intelligent customer service has become a breakthrough in digital transformation of enterprises［J］．Information Technology in China，2022（2）：35．
11	张琳涵．面向转录文本的语音识别错误检测和纠正方法研究［D］．哈尔滨：哈尔滨工业大学，2020．
12	ZHOU L， SHI Y M， FENG J J，et al ．Data mining for detecting errors in dictation speech recognition［J］．IEEE Transactions on Speech and Audio Processing，2005，13：681-688．
13	MERIPO N V， KONAM S ．ASR Error detection via audio-transcript entailment［C］∥Proceedings of the Interspeech 2022．Incheon：The Acoustical Society of Korea，2022：3358-3362．
14	AINSWORTH W A， PRATT S R ．Feedback strategies for error correction in speech recognition systems［J］．International Journal of Man-Machine Studies，1992，36（6）：833-842．
15	SUHM B， MYERS B， WAIBEL A ．Multimodal error correction for speech user interfaces［J］．ACM Transactions on Computer-Human Interaction，2001，8（1）：60-98．
16	张佳宁，严冬梅，王勇．基于word2vec的语音识别后文本纠错［J］．计算机工程与设计，2020，41（11）：3235-3240．
	ZHANG Jia-ning， YAN Dong-mei， WANG Yong ．Text correction based on word2vec speech recognition［J］．Computer Engineering and Design，2020，41（11）：3235-3240．
17	王兴建．语音识别后文本处理系统中文本语音信息评价算法研究［D］．北京：北京邮电大学，2010．
18	黄大吉，林海香．基于嵌入式NLP的铁路车务术语语音识别方法［J］．兰州交通大学学报，2020，39（5）：64-69，75．
	HUANG Da-ji， LIN Hai-xiang ．Railway traffic term speech recognition method based on embedded NLP［J］．Journal of Lanzhou Jiaotong University，2020，39（5）：64-69，75．
19	马文晖，冯国斌，刘为民，等．语音识别后文本纠检错算法研究［J］．铁道通信信号，2020，56（11）：55-58．
	MA Wenhui， FENG Guobin， LIU Weimin，et al ．Research on text error detection and correction arithmetic after speech recognition［J］．Railway Signalling & Communication，2020，56（11）：55-58．
20	KIM Y ．Convolutional neural networks for sentence classification［C］∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing．Doha：ACL，2014：1746-1751．
21	DEVLIN J， CHANG M W， LEE K，et al ．BERT：pre-training of deep bidirectional transformers for language understanding［C］∥Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics：Human Language Technologies．Minneapolis：Association for Computational Linguistics，2019：4171-4186．
22	VASWANI A， SHAZEER N， PARMAR N，et al ．Attention is all you need［C］∥Proceedings of the 31st International Conference on Neural Information Processing Systems．New York：Curran Associates Inc，2017：6000-6010．
23	ZHANG S H， HUANG H R， LIU J C，et al ．Spelling error correction with soft-masked BERT［C］∥Procee-dings of the 58th Annual Meeting of the Association for Computational Linguistics．Online：Association for Computational Linguistics．［S.l.］：Association for Computational Linguistics，2020：882-890．
24	SpeechIO ．SpeechColab ASR leaderboard［EB/OL］．（2022-10-24）［2022-11-01］．．

指标	结果			指标	结果
准确率	87.2		召回率		90.9
查准率	93.0		F₁		92.0

长短句比例	准确率/%	查准率/%	召回率/%	F₁/%
7︰3	77.4	78.0	68.9	73.1
8︰2	85.1	84.0	67.1	74.6
9︰1	78.6	76.6	69.2	72.7

遮蔽率/%	准确率/%	查准率/%	召回率/%	F₁/%
10	83.9	80.5	60.0	68.8
15	85.1	84.0	67.1	74.6
20	82.3	81.7	61.7	70.3

丢失率/%	准确率/%	查准率/%	召回率/%	F₁/%
10	85.1	84.0	67.1	74.6
30	73.5	81.1	66.7	73.2
50	73.8	81.6	66.7	73.4

学习率/%	准确率/%	查准率/%	召回率/%	F₁/%
5×10^-5	73.4	81.2	64.6	71.9
1×10^-4	85.1	84.0	67.1	74.6
5×10^-4	72.3	81.8	67.7	74.1

基于行业词表的自动语音转写后优化技术

Translation Optimization Technology of Automatic Speech Recognition Based on Industry-Specific Vocabulary

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 24

相关文章 15

编辑推荐

Metrics

本文评价

ASR引擎	插入错误率/%	删除错误率/%	替换错误率/%	字准确率/%
THASR	4.55	2.55	15.56	81.78
ALIASR	2.53	7.09	5.96	84.59
THASR-E	1.47	1.77	6.35	91.85
ALIASR-E	0.50	1.82	4.10	94.07

示例编号	通用ASR转写结果	转写纠错后结果
1	那这边按键已经移交到荔湾区	那这边案件已经移交到荔湾区
2	这边搜索到你的工单了，你的工单呢，已经处于一个归档半截的一个状态，并且有个处理结果，我这边读一下给您看可以吗？	这边搜索到你的工单了，你的工单呢，已经处于一个归档办结的一个状态，并且有个处理结果，我这边读一下给您看可以吗？
3	如果你们需要财政的话	如果你们需要采证的话
4	是天河区郑家对吧	是天河区正佳对吧
5	产生宽带广州有很多个营业厅的嘛，就是看你报装那个地址是哪个营业厅	长城宽带广州有很多个营业厅的嘛，就是看你报装那个地址是哪个营业厅

[1]	苏锦钿, 余珊珊, 洪晓斌. 一种面向中文拼写纠错的自监督预训练方法[J]. 华南理工大学学报(自然科学版), 2023, 51(9): 90-98.
[2]	朱铮宇, 罗超, 贺前华, 等. 基于唇重构与三维耦合CNN的多视角音唇一致性判别[J]. 华南理工大学学报(自然科学版), 2023, 51(5): 70-77.
[3]	叶峰, 陈彪, 赖乙宗. 基于特征空间嵌入的对比知识蒸馏算法[J]. 华南理工大学学报(自然科学版), 2023, 51(5): 13-23.
[4]	罗玉涛, 高强. 基于通道注意力和特征增强的交通标志检测[J]. 华南理工大学学报(自然科学版), 2023, 51(12): 64-72.
[5]	邱志斌, 卢祖文, 王海祥, 等. 基于Mel频谱图和CNN的电网涉鸟故障鸟声识别[J]. 华南理工大学学报(自然科学版), 2022, 50(2): 129-136.
[6]	张香竹, 张立家, 宋逸凡, 等. 基于深度学习的无人机单目视觉避障算法[J]. 华南理工大学学报（自然科学版）, 2022, 50(1): 101-108, 131.
[7]	黄敏齐海涛蒋春林. 基于注意力机制的耦合协同过滤模型[J]. 华南理工大学学报(自然科学版), 2021, 49(7): 59-65.
[8]	刘奇, 于斌, 孟祥成, 等. 基于转置卷积神经网络的路面裂缝识别算法[J]. 华南理工大学学报(自然科学版), 2021, 49(12): 124-132.
[9]	李波饶浩波. 复杂场景下特征增强的显著性目标检测方法[J]. 华南理工大学学报（自然科学版）, 2021, 49(11): 135-144.
[10]	谢康, 陈晓斌, 尧俊凯, 等. 基于机器视觉的建筑垃圾填料物质组分图像分析方法[J]. 华南理工大学学报（自然科学版）, 2021, 49(10): 50-58,69.
[11]	杜启亮, 黄理广, 田联房, 等. 基于视频监控的手扶电梯乘客异常行为识别[J]. 华南理工大学学报（自然科学版）, 2020, 48(8): 10-21.
[12]	陈善雄, 韩旭, 林小渝, 等. 基于 MSER 和 CNN 的彝文古籍文献的字符检测方法[J]. 华南理工大学学报（自然科学版）, 2020, 48(6): 123-133.
[13]	范自柱, 王松, 张泓, 等. 基于 W- Net 的高分辨率遥感卫星图像分割 [J]. 华南理工大学学报（自然科学版）, 2020, 48(12): 114-124.
[14]	文生平, 周正军, 张啸言, 等. 基于计算机视觉的轴承滚子表面缺陷在线检测系统[J]. 华南理工大学学报（自然科学版）, 2020, 48(10): 76-87.
[15]	刘建国, 冯云剑, 纪郭, 等. 一种基于 PSMNet 改进的立体匹配算法[J]. 华南理工大学学报（自然科学版）, 2020, 48(1): 60-69,83.