华南理工大学学报(自然科学版) ›› 2023, Vol. 51 ›› Issue (8): 118-125.doi: 10.12141/j.issn.1000-565X.220740

所属专题: 2023年电子、通信与自动控制

• 电子、通信与自动控制 • 上一篇    下一篇

基于行业词表的自动语音转写后优化技术

马晓亮1,2,3 安玲玲1 邓从健1,3,4 杜德泉2,3 张国新5   

  1. 1.西安电子科技大学 广州研究院, 广东 广州 510555
    2.中国电信股份有限公司 广州分公司, 广东 广州 510620
    3.马晓亮劳模和创新工匠工作室, 广东 广州 510620
    4.广州云趣信息科技有限公司, 广东 广州 510665
    5.中国电信股份有限公司 广东分公司, 广东 广州 510080
  • 收稿日期:2022-11-10 出版日期:2023-08-25 发布日期:2023-03-01
  • 通信作者: 马晓亮(1973-),男,博士生,高级工程师,华南理工大学工商管理学院讲席教授,主要从事AI、NLP、方言处理、运营商客服运营、数据安全保护等研究。 E-mail:maxiaol.gd@chinatelecom.cn
  • 作者简介:马晓亮(1973-),男,博士生,高级工程师,华南理工大学工商管理学院讲席教授,主要从事AI、NLP、方言处理、运营商客服运营、数据安全保护等研究。
  • 基金资助:
    国家重点研发计划项目(2022YFB3102700);国家自然科学基金重点资助项目(62132013)

Translation Optimization Technology of Automatic Speech Recognition Based on Industry-Specific Vocabulary

MA Xiaoliang1,2,3 AN Lingling1 DENG Congjian1,3,4 DU Dequan2,3 ZHANG Guoxin5   

  1. 1.Guangzhou Institute of Technology,Xidian University,Guangzhou 510555,Guangdong,China
    2.Guangzhou Branch of China Telecom Co. ,Ltd. ,Guangzhou 510620,Guangdong,China
    3.Ma Xiaoliang’s Model Worker and Innovative Craftsman Workshop,Guangzhou 510620,Guangdong,China
    4.Guangzhou Yunqu Information Technology Co. ,Ltd. ,Guangzhou 510665,Guangdong,China
    5.Guangdong Branch of China Telecom Co. ,Ltd. ,Guangzhou 510080,Guangdong,China
  • Received:2022-11-10 Online:2023-08-25 Published:2023-03-01
  • Contact: 马晓亮(1973-),男,博士生,高级工程师,华南理工大学工商管理学院讲席教授,主要从事AI、NLP、方言处理、运营商客服运营、数据安全保护等研究。 E-mail:maxiaol.gd@chinatelecom.cn
  • About author:马晓亮(1973-),男,博士生,高级工程师,华南理工大学工商管理学院讲席教授,主要从事AI、NLP、方言处理、运营商客服运营、数据安全保护等研究。
  • Supported by:
    the National Key Research and Development Program of China(2022YFB3102700);the National Natural Science Foundation of China(62132013)

摘要:

自动语音识别(ASR)技术目前已发展得较为成熟,通用ASR引擎已经广泛应用于交通、医疗、通信等行业。但是,由于行业专有词汇在大规模训练语料库中呈非独立同态分布,通用ASR引擎在各细分行业转写时存在对行业专有词汇识别准确率低的问题。相较于互联网环境的16 kHz音频采样率,电话呼叫中心语音为窄带低采样(采样率8 kHz),转写后精度下降尤为明显。为了提高行业词汇的语音转写准确率,文中提出一种基于行业词表的ASR转写后优化技术。首先,对语料库文本数据分别采用卷积神经网络模型和深度神经网络BERT模型进行预测分词,生成行业纠错词表。随后,在生产环境中,使用通用ASR引擎对电话呼叫语音数据进行初始转写。然后,对一次转写后的文本,通过Soft-Masked BERT模型结合纠错词表实现文本数据的纠错,从而提高语音识别准确率。使用广州12345热线客服通话语音数据进行训练和测试,结果表明,使用文中的转写后优化技术可以将通用ASR引擎的行业用词转写准确率提高约10个百分点,且纠错速度较快,具有良好的适用性。

关键词: 文本纠错, 语音识别, 客服通话, 行业纠错词表, 卷积神经网络

Abstract:

Automatic speech recognition (ASR) technology has been developed relatively mature, and general ASR engines have been widely used in transportation, medical, communication and other industries. However, due to non-independent homology of industry-specific vocabulary in the large-scale training corpus, there comes to low recognition accuracy of industry-specific vocabulary when the general ASR engines are applied to various subdivisions of industries. As compared with 16 kHz audio sampling rate in Internet environment, narrowband low sampling (8 kHz) of call center may result in more significant decrease of recognition accuracy of ASR. In order to improve the accuracy of speech recognition of industry-specific words, this paper proposes a translation optimization technology of ASR based on industry-specific vocabulary. Specifically, first, convolutional neural network model and deep neural network BERT model are used to predict word for corpus text data, and an industry-specific error correction vocabulary is generated. Next, in the production environment, a general ASR engine is used to perform initial transcription of telephone call voice data. Then, the transcribed text is corrected by using the Soft-Masked BERT model combined with the industry-specific error correction vocabulary, thus improving the accuracy of speech recognition. Finally, by using 12345 hotline customer service call voice data for modeling and testing, the proposed translation optimization technology is proved efficient in improving the accuracy of general ASR recognition by 10 percentage points with high error correction speed and good applicability.

Key words: text error correction, speech recognition, customer service calls, industry-specific vocabulary, convolutional neural network 

中图分类号: