Journal of South China University of Technology (Natural Science Edition) ›› 2021, Vol. 49 ›› Issue (11): 87-94.doi: 10.12141/j.issn.1000-565X.210173

Special Issue: 2021年电子、通信与自动控制

• Electronics, Communication & Automation Technology • Previous Articles     Next Articles

Speech Bandwidth Extension Based on Flatten-CNN

YANG Junmei LEI Yang CHEN Xikun   

  1. School of Electronic and Information Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China
  • Received:2021-03-29 Revised:2021-09-29 Online:2021-11-25 Published:2021-11-01
  • Contact: 杨俊美(1979-),女,博士,副教授,主要从事智能信息处理技术研究。 E-mail:yjunmei@scut.edu.cn
  • About author:杨俊美(1979-),女,博士,副教授,主要从事智能信息处理技术研究。
  • Supported by:
    Supported by the National Natural Science Foundation of China(61871188,61801133)

Abstract: The existing deep learning-based speech bandwidth extension algorithms have many disadvantages:the time domain algorithms speech  feature extraction  is not accurate enough and its training data is too large;the frequency domain algorithm pays little attention to the information association between frames in log power spectrum feature extraction and the number of frequency axes is odd number which is inconvenient for deepening the network depth.In addition,it ignores time domain information;the time-frequency two-domain algorithm model is relatively complicated.To solve these problems,this paper proposed a speech bandwidth extension algorithm based on Flatten-CNN.Firstly,in order to make full use of speech features and reduce the amount of data,the algorithm was operated on frequency domain.Secondly,an improved encoder was proposed  to make use of the logarithmic power spectrum time axis information.The log power spectrum feature extraction of two-axis was realized by introducing tile layers.Thirdly,in order to deepen the network depth,the last point was removed during the frequency axis data processing and a zero was added when restoring,so  to ensure that the frequency axis number is an even number.Finally,in order to utilize the voice signal time domain information,time domain loss was introduced into the loss function.The effectiveness of the algorithm  was verified with the TIMIT data set and the VCTK data set.The experimental results show that,compared with the current mainstream algorithms,the new algorithm can improve the high-bandwidth speech quality,showing better hearing effect.

Key words: speech bandwidth extension, tile layer, time-frequency two-axis feature extraction, time-frequency loss, network depth

CLC Number: