Translation Optimization Technology of Automatic Speech Recognition Based on Industry-Specific Vocabulary

MA Xiaoliang, AN Lingling, DENG Congjian, et al

doi:10.12141/j.issn.1000-565X.220740

Journal of South China University of Technology(Natural Science) >

2023 , Vol. 51 >Issue 8: 118 - 125

DOI: https://doi.org/10.12141/j.issn.1000-565X.220740

Electronics, Communication & Automation Technology

Translation Optimization Technology of Automatic Speech Recognition Based on Industry-Specific Vocabulary

Expand

^1.Guangzhou Institute of Technology，Xidian University，Guangzhou 510555，Guangdong，China
^2.Guangzhou Branch of China Telecom Co. ，Ltd. ，Guangzhou 510620，Guangdong，China
^3.Ma Xiaoliang’s Model Worker and Innovative Craftsman Workshop，Guangzhou 510620，Guangdong，China
^4.Guangzhou Yunqu Information Technology Co. ，Ltd. ，Guangzhou 510665，Guangdong，China
^5.Guangdong Branch of China Telecom Co. ，Ltd. ，Guangzhou 510080，Guangdong，China

马晓亮（1973-），男，博士生，高级工程师，华南理工大学工商管理学院讲席教授，主要从事AI、NLP、方言处理、运营商客服运营、数据安全保护等研究。

Received date: 2022-11-10

Online published: 2023-03-01

Supported by

the National Key Research and Development Program of China(2022YFB3102700);the National Natural Science Foundation of China(62132013)

Fold

Abstract

Automatic speech recognition (ASR) technology has been developed relatively mature, and general ASR engines have been widely used in transportation, medical, communication and other industries. However, due to non-independent homology of industry-specific vocabulary in the large-scale training corpus, there comes to low recognition accuracy of industry-specific vocabulary when the general ASR engines are applied to various subdivisions of industries. As compared with 16 kHz audio sampling rate in Internet environment, narrowband low sampling (8 kHz) of call center may result in more significant decrease of recognition accuracy of ASR. In order to improve the accuracy of speech recognition of industry-specific words, this paper proposes a translation optimization technology of ASR based on industry-specific vocabulary. Specifically, first, convolutional neural network model and deep neural network BERT model are used to predict word for corpus text data, and an industry-specific error correction vocabulary is generated. Next, in the production environment, a general ASR engine is used to perform initial transcription of telephone call voice data. Then, the transcribed text is corrected by using the Soft-Masked BERT model combined with the industry-specific error correction vocabulary, thus improving the accuracy of speech recognition. Finally, by using 12345 hotline customer service call voice data for modeling and testing, the proposed translation optimization technology is proved efficient in improving the accuracy of general ASR recognition by 10 percentage points with high error correction speed and good applicability.

Key words： text error correction; speech recognition; customer service calls; industry-specific vocabulary; convolutional neural network

Cite this article

MA Xiaoliang, AN Lingling, DENG Congjian, et al . Translation Optimization Technology of Automatic Speech Recognition Based on Industry-Specific Vocabulary[J]. Journal of South China University of Technology(Natural Science), 2023 , 51(8) : 118 -125 . DOI: 10.12141/j.issn.1000-565X.220740

References

1	蒋竺芳．端到端自动语音识别技术研究［D］．北京：北京邮电大学，2019．
2	王琦．呼叫中心技术及其发展［J］．中国数据通信，2004（1）：50-53．
	WANG Qi ．Technology and development of call center［J］．China New Telecommunications，2004（1）：50-53．
3	王宏芳．智能语音客服系统在呼叫中心领域的应用及展望［J］．通信企业管理，2017（6）：57-59．
	WANG Hongfang ．Application and prospect of intelligent customer service system in the field of call center［J］．C-Enterprise Management，2017（6）：57-59．
4	DAVIS K ．Automatic recognition of spoken digits［J］．The Journal of the Acoustical Society of America，1952，24（6）：637．
5	YOUNG S， EVERMANN G， GALES M，et al ．The HTK book［EB/OL］．（2015-12-10）［2022-11-01］．．
6	MADHAB P ．Multilingual conversational telephony speech corpus creation for real world speaker diarization and recognition［C］∥Proceedings of the 2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques （O-COCOSDA）．Bali：IEEE，2016：177-182．
7	胡登峰，黄紫微，冯楠，等．关键核心技术突破与国产替代路径及机制——科大讯飞智能语音技术纵向案例研究［J］．管理世界，2022，38（5）：188-209．
	HU Dengfeng， HUANG Ziwei， FENG Nan，et al ．Path and mechanism of core technology breakthrough and domestic substitution：a longitudinal case study of IFLY TEK intelligent speech technology［J］．Journal of Management World，2022，38（5）：188-209．
8	杜灵君，武晓岛．语音识别技术全球专利布局趋势［J］．科技中国，2021（12）：51-55．
	DU Lingjun， WU Xiaodao ．The global distribution trend of speech recognition patents［J］．Scitech in China，2021（12）：51-55．
9	曹冬玉，陶传奇，郭虹静，等．用户评论驱动的语音测试数据生成方法［J］．小型微型计算机系统，2023，44（7）：1382-1390．
	CAO Dong-yu， TAO Chuan-qi， GUO Hong-jing，et al ．Yest speech generation driven by user reviews［J］．Journal of Chinese Computer Systems，2023，44（7）：1382-1390．
10	孙杰贤．智能客服成为企业数字化转型突破口［J］．中国信息化，2022（2）：35．
	SUN Jiexian ．Intelligent customer service has become a breakthrough in digital transformation of enterprises［J］．Information Technology in China，2022（2）：35．
11	张琳涵．面向转录文本的语音识别错误检测和纠正方法研究［D］．哈尔滨：哈尔滨工业大学，2020．
12	ZHOU L， SHI Y M， FENG J J，et al ．Data mining for detecting errors in dictation speech recognition［J］．IEEE Transactions on Speech and Audio Processing，2005，13：681-688．
13	MERIPO N V， KONAM S ．ASR Error detection via audio-transcript entailment［C］∥Proceedings of the Interspeech 2022．Incheon：The Acoustical Society of Korea，2022：3358-3362．
14	AINSWORTH W A， PRATT S R ．Feedback strategies for error correction in speech recognition systems［J］．International Journal of Man-Machine Studies，1992，36（6）：833-842．
15	SUHM B， MYERS B， WAIBEL A ．Multimodal error correction for speech user interfaces［J］．ACM Transactions on Computer-Human Interaction，2001，8（1）：60-98．
16	张佳宁，严冬梅，王勇．基于word2vec的语音识别后文本纠错［J］．计算机工程与设计，2020，41（11）：3235-3240．
	ZHANG Jia-ning， YAN Dong-mei， WANG Yong ．Text correction based on word2vec speech recognition［J］．Computer Engineering and Design，2020，41（11）：3235-3240．
17	王兴建．语音识别后文本处理系统中文本语音信息评价算法研究［D］．北京：北京邮电大学，2010．
18	黄大吉，林海香．基于嵌入式NLP的铁路车务术语语音识别方法［J］．兰州交通大学学报，2020，39（5）：64-69，75．
	HUANG Da-ji， LIN Hai-xiang ．Railway traffic term speech recognition method based on embedded NLP［J］．Journal of Lanzhou Jiaotong University，2020，39（5）：64-69，75．
19	马文晖，冯国斌，刘为民，等．语音识别后文本纠检错算法研究［J］．铁道通信信号，2020，56（11）：55-58．
	MA Wenhui， FENG Guobin， LIU Weimin，et al ．Research on text error detection and correction arithmetic after speech recognition［J］．Railway Signalling & Communication，2020，56（11）：55-58．
20	KIM Y ．Convolutional neural networks for sentence classification［C］∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing．Doha：ACL，2014：1746-1751．
21	DEVLIN J， CHANG M W， LEE K，et al ．BERT：pre-training of deep bidirectional transformers for language understanding［C］∥Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics：Human Language Technologies．Minneapolis：Association for Computational Linguistics，2019：4171-4186．
22	VASWANI A， SHAZEER N， PARMAR N，et al ．Attention is all you need［C］∥Proceedings of the 31st International Conference on Neural Information Processing Systems．New York：Curran Associates Inc，2017：6000-6010．
23	ZHANG S H， HUANG H R， LIU J C，et al ．Spelling error correction with soft-masked BERT［C］∥Procee-dings of the 58th Annual Meeting of the Association for Computational Linguistics．Online：Association for Computational Linguistics．［S.l.］：Association for Computational Linguistics，2020：882-890．
24	SpeechIO ．SpeechColab ASR leaderboard［EB/OL］．（2022-10-24）［2022-11-01］．．

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References