收稿日期: 2022-11-10
网络出版日期: 2023-03-01
基金资助
国家重点研发计划项目(2022YFB3102700);国家自然科学基金重点资助项目(62132013)
Translation Optimization Technology of Automatic Speech Recognition Based on Industry-Specific Vocabulary
Received date: 2022-11-10
Online published: 2023-03-01
Supported by
the National Key Research and Development Program of China(2022YFB3102700);the National Natural Science Foundation of China(62132013)
自动语音识别(ASR)技术目前已发展得较为成熟,通用ASR引擎已经广泛应用于交通、医疗、通信等行业。但是,由于行业专有词汇在大规模训练语料库中呈非独立同态分布,通用ASR引擎在各细分行业转写时存在对行业专有词汇识别准确率低的问题。相较于互联网环境的16 kHz音频采样率,电话呼叫中心语音为窄带低采样(采样率8 kHz),转写后精度下降尤为明显。为了提高行业词汇的语音转写准确率,文中提出一种基于行业词表的ASR转写后优化技术。首先,对语料库文本数据分别采用卷积神经网络模型和深度神经网络BERT模型进行预测分词,生成行业纠错词表。随后,在生产环境中,使用通用ASR引擎对电话呼叫语音数据进行初始转写。然后,对一次转写后的文本,通过Soft-Masked BERT模型结合纠错词表实现文本数据的纠错,从而提高语音识别准确率。使用广州12345热线客服通话语音数据进行训练和测试,结果表明,使用文中的转写后优化技术可以将通用ASR引擎的行业用词转写准确率提高约10个百分点,且纠错速度较快,具有良好的适用性。
马晓亮, 安玲玲, 邓从健, 等 . 基于行业词表的自动语音转写后优化技术[J]. 华南理工大学学报(自然科学版), 2023 , 51(8) : 118 -125 . DOI: 10.12141/j.issn.1000-565X.220740
Automatic speech recognition (ASR) technology has been developed relatively mature, and general ASR engines have been widely used in transportation, medical, communication and other industries. However, due to non-independent homology of industry-specific vocabulary in the large-scale training corpus, there comes to low recognition accuracy of industry-specific vocabulary when the general ASR engines are applied to various subdivisions of industries. As compared with 16 kHz audio sampling rate in Internet environment, narrowband low sampling (8 kHz) of call center may result in more significant decrease of recognition accuracy of ASR. In order to improve the accuracy of speech recognition of industry-specific words, this paper proposes a translation optimization technology of ASR based on industry-specific vocabulary. Specifically, first, convolutional neural network model and deep neural network BERT model are used to predict word for corpus text data, and an industry-specific error correction vocabulary is generated. Next, in the production environment, a general ASR engine is used to perform initial transcription of telephone call voice data. Then, the transcribed text is corrected by using the Soft-Masked BERT model combined with the industry-specific error correction vocabulary, thus improving the accuracy of speech recognition. Finally, by using 12345 hotline customer service call voice data for modeling and testing, the proposed translation optimization technology is proved efficient in improving the accuracy of general ASR recognition by 10 percentage points with high error correction speed and good applicability.
| 1 | 蒋竺芳 .端到端自动语音识别技术研究[D].北京:北京邮电大学,2019. |
| 2 | 王琦 .呼叫中心技术及其发展[J].中国数据通信,2004(1):50-53. |
| WANG Qi .Technology and development of call center[J].China New Telecommunications,2004(1):50-53. | |
| 3 | 王宏芳 .智能语音客服系统在呼叫中心领域的应用及展望[J].通信企业管理,2017(6):57-59. |
| WANG Hongfang .Application and prospect of intelligent customer service system in the field of call center[J].C-Enterprise Management,2017(6):57-59. | |
| 4 | DAVIS K .Automatic recognition of spoken digits[J].The Journal of the Acoustical Society of America,1952,24(6):637. |
| 5 | YOUNG S, EVERMANN G, GALES M,et al .The HTK book[EB/OL].(2015-12-10)[2022-11-01].. |
| 6 | MADHAB P .Multilingual conversational telephony speech corpus creation for real world speaker diarization and recognition[C]∥Proceedings of the 2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA).Bali:IEEE,2016:177-182. |
| 7 | 胡登峰,黄紫微,冯楠,等 .关键核心技术突破与国产替代路径及机制——科大讯飞智能语音技术纵向案例研究[J].管理世界,2022,38(5):188-209. |
| HU Dengfeng, HUANG Ziwei, FENG Nan,et al .Path and mechanism of core technology breakthrough and domestic substitution:a longitudinal case study of IFLY TEK intelligent speech technology[J].Journal of Management World,2022,38(5):188-209. | |
| 8 | 杜灵君,武晓岛 .语音识别技术全球专利布局趋势[J].科技中国,2021(12):51-55. |
| DU Lingjun, WU Xiaodao .The global distribution trend of speech recognition patents[J].Scitech in China,2021(12):51-55. | |
| 9 | 曹冬玉,陶传奇,郭虹静,等 .用户评论驱动的语音测试数据生成方法[J].小型微型计算机系统,2023,44(7):1382-1390. |
| CAO Dong-yu, TAO Chuan-qi, GUO Hong-jing,et al .Yest speech generation driven by user reviews[J].Journal of Chinese Computer Systems,2023,44(7):1382-1390. | |
| 10 | 孙杰贤 .智能客服成为企业数字化转型突破口[J].中国信息化,2022(2):35. |
| SUN Jiexian .Intelligent customer service has become a breakthrough in digital transformation of enterprises[J].Information Technology in China,2022(2):35. | |
| 11 | 张琳涵 .面向转录文本的语音识别错误检测和纠正方法研究[D].哈尔滨:哈尔滨工业大学,2020. |
| 12 | ZHOU L, SHI Y M, FENG J J,et al .Data mining for detecting errors in dictation speech recognition[J].IEEE Transactions on Speech and Audio Processing,2005,13:681-688. |
| 13 | MERIPO N V, KONAM S .ASR Error detection via audio-transcript entailment[C]∥Proceedings of the Interspeech 2022.Incheon:The Acoustical Society of Korea,2022:3358-3362. |
| 14 | AINSWORTH W A, PRATT S R .Feedback strategies for error correction in speech recognition systems[J].International Journal of Man-Machine Studies,1992,36(6):833-842. |
| 15 | SUHM B, MYERS B, WAIBEL A .Multimodal error correction for speech user interfaces[J].ACM Transactions on Computer-Human Interaction,2001,8(1):60-98. |
| 16 | 张佳宁,严冬梅,王勇 .基于word2vec的语音识别后文本纠错[J].计算机工程与设计,2020,41(11):3235-3240. |
| ZHANG Jia-ning, YAN Dong-mei, WANG Yong .Text correction based on word2vec speech recognition[J].Computer Engineering and Design,2020,41(11):3235-3240. | |
| 17 | 王兴建 .语音识别后文本处理系统中文本语音信息评价算法研究[D].北京:北京邮电大学,2010. |
| 18 | 黄大吉,林海香 .基于嵌入式NLP的铁路车务术语语音识别方法[J].兰州交通大学学报,2020,39(5):64-69,75. |
| HUANG Da-ji, LIN Hai-xiang .Railway traffic term speech recognition method based on embedded NLP[J].Journal of Lanzhou Jiaotong University,2020,39(5):64-69,75. | |
| 19 | 马文晖,冯国斌,刘为民,等 .语音识别后文本纠检错算法研究[J].铁道通信信号,2020,56(11):55-58. |
| MA Wenhui, FENG Guobin, LIU Weimin,et al .Research on text error detection and correction arithmetic after speech recognition[J].Railway Signalling & Communication,2020,56(11):55-58. | |
| 20 | KIM Y .Convolutional neural networks for sentence classification[C]∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Doha:ACL,2014:1746-1751. |
| 21 | DEVLIN J, CHANG M W, LEE K,et al .BERT:pre-training of deep bidirectional transformers for language understanding[C]∥Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Minneapolis:Association for Computational Linguistics,2019:4171-4186. |
| 22 | VASWANI A, SHAZEER N, PARMAR N,et al .Attention is all you need[C]∥Proceedings of the 31st International Conference on Neural Information Processing Systems.New York:Curran Associates Inc,2017:6000-6010. |
| 23 | ZHANG S H, HUANG H R, LIU J C,et al .Spelling error correction with soft-masked BERT[C]∥Procee-dings of the 58th Annual Meeting of the Association for Computational Linguistics.Online:Association for Computational Linguistics.[S.l.]:Association for Computational Linguistics,2020:882-890. |
| 24 | SpeechIO .SpeechColab ASR leaderboard[EB/OL].(2022-10-24)[2022-11-01].. |
/
| 〈 |
|
〉 |