Journal of South China University of Technology(Natural Science Edition) ›› 2023, Vol. 51 ›› Issue (8): 118-125.doi: 10.12141/j.issn.1000-565X.220740

Special Issue: 2023年电子、通信与自动控制

• Electronics, Communication & Automation Technology • Previous Articles     Next Articles

Translation Optimization Technology of Automatic Speech Recognition Based on Industry-Specific Vocabulary

MA Xiaoliang1,2,3 AN Lingling1 DENG Congjian1,3,4 DU Dequan2,3 ZHANG Guoxin5   

  1. 1.Guangzhou Institute of Technology,Xidian University,Guangzhou 510555,Guangdong,China
    2.Guangzhou Branch of China Telecom Co. ,Ltd. ,Guangzhou 510620,Guangdong,China
    3.Ma Xiaoliang’s Model Worker and Innovative Craftsman Workshop,Guangzhou 510620,Guangdong,China
    4.Guangzhou Yunqu Information Technology Co. ,Ltd. ,Guangzhou 510665,Guangdong,China
    5.Guangdong Branch of China Telecom Co. ,Ltd. ,Guangzhou 510080,Guangdong,China
  • Received:2022-11-10 Online:2023-08-25 Published:2023-03-01
  • Contact: 马晓亮(1973-),男,博士生,高级工程师,华南理工大学工商管理学院讲席教授,主要从事AI、NLP、方言处理、运营商客服运营、数据安全保护等研究。 E-mail:maxiaol.gd@chinatelecom.cn
  • About author:马晓亮(1973-),男,博士生,高级工程师,华南理工大学工商管理学院讲席教授,主要从事AI、NLP、方言处理、运营商客服运营、数据安全保护等研究。
  • Supported by:
    the National Key Research and Development Program of China(2022YFB3102700);the National Natural Science Foundation of China(62132013)

Abstract:

Automatic speech recognition (ASR) technology has been developed relatively mature, and general ASR engines have been widely used in transportation, medical, communication and other industries. However, due to non-independent homology of industry-specific vocabulary in the large-scale training corpus, there comes to low recognition accuracy of industry-specific vocabulary when the general ASR engines are applied to various subdivisions of industries. As compared with 16 kHz audio sampling rate in Internet environment, narrowband low sampling (8 kHz) of call center may result in more significant decrease of recognition accuracy of ASR. In order to improve the accuracy of speech recognition of industry-specific words, this paper proposes a translation optimization technology of ASR based on industry-specific vocabulary. Specifically, first, convolutional neural network model and deep neural network BERT model are used to predict word for corpus text data, and an industry-specific error correction vocabulary is generated. Next, in the production environment, a general ASR engine is used to perform initial transcription of telephone call voice data. Then, the transcribed text is corrected by using the Soft-Masked BERT model combined with the industry-specific error correction vocabulary, thus improving the accuracy of speech recognition. Finally, by using 12345 hotline customer service call voice data for modeling and testing, the proposed translation optimization technology is proved efficient in improving the accuracy of general ASR recognition by 10 percentage points with high error correction speed and good applicability.

Key words: text error correction, speech recognition, customer service calls, industry-specific vocabulary, convolutional neural network 

CLC Number: