华南理工大学学报(自然科学版) ›› 2021, Vol. 49 ›› Issue (11): 87-94.doi: 10.12141/j.issn.1000-565X.210173

所属专题: 2021年电子、通信与自动控制

• 电子、通信与自动控制 • 上一篇    下一篇

基于Flatten-CNN的语音带宽扩展研究

杨俊美 雷杨 陈习坤   

  1. 华南理工大学 电子与信息学院,广东 广州 510640
  • 收稿日期:2021-03-29 修回日期:2021-09-29 出版日期:2021-11-25 发布日期:2021-11-01
  • 通信作者: 杨俊美(1979-),女,博士,副教授,主要从事智能信息处理技术研究。 E-mail:yjunmei@scut.edu.cn
  • 作者简介:杨俊美(1979-),女,博士,副教授,主要从事智能信息处理技术研究。
  • 基金资助:
    国家自然科学基金资助项目(61871188,61801133)

Speech Bandwidth Extension Based on Flatten-CNN

YANG Junmei LEI Yang CHEN Xikun   

  1. School of Electronic and Information Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China
  • Received:2021-03-29 Revised:2021-09-29 Online:2021-11-25 Published:2021-11-01
  • Contact: 杨俊美(1979-),女,博士,副教授,主要从事智能信息处理技术研究。 E-mail:yjunmei@scut.edu.cn
  • About author:杨俊美(1979-),女,博士,副教授,主要从事智能信息处理技术研究。
  • Supported by:
    Supported by the National Natural Science Foundation of China(61871188,61801133)

摘要: 现有基于深度学习的语音带宽扩展算法中,时域算法语音特征提取不够精确,训练数据量大;频域算法对数功率谱特征提取未重视帧与帧之间的信息关联,频率轴数为奇数,不便于加深网络深度,且忽略时域信息;时频两域算法模型相对复杂。针对以上问题,文中提出了一种基于Flatten-CNN的语音带宽扩展算法。首先,为了充分利用语音特征和减少数据量,文中算法基于频域运行;其次,为了利用对数功率谱时间轴信息,提出了一种改进的编码器,通过引入平铺层,实现对数功率谱时频两轴特征提取;接着,为了加深网络深度,在频率轴数据处理时去掉最后一个点,还原时再补零,使频率轴数为偶数,以利于加深网络深度;最后,为了利用语音信号时域信息,在损失函数中引入时域损失。为验证文中算法的有效性,用TIMIT数据集和VCTK数据集进行了模型的训练和测试,实验结果表明,与当前主流算法相比,文中算法生成的高带宽语音质量得到提高,呈现出了更好的听觉效果。

关键词: 语音带宽扩展, 平铺层, 时频两轴特征提取, 时频损失, 网络深度

Abstract: The existing deep learning-based speech bandwidth extension algorithms have many disadvantages:the time domain algorithms speech  feature extraction  is not accurate enough and its training data is too large;the frequency domain algorithm pays little attention to the information association between frames in log power spectrum feature extraction and the number of frequency axes is odd number which is inconvenient for deepening the network depth.In addition,it ignores time domain information;the time-frequency two-domain algorithm model is relatively complicated.To solve these problems,this paper proposed a speech bandwidth extension algorithm based on Flatten-CNN.Firstly,in order to make full use of speech features and reduce the amount of data,the algorithm was operated on frequency domain.Secondly,an improved encoder was proposed  to make use of the logarithmic power spectrum time axis information.The log power spectrum feature extraction of two-axis was realized by introducing tile layers.Thirdly,in order to deepen the network depth,the last point was removed during the frequency axis data processing and a zero was added when restoring,so  to ensure that the frequency axis number is an even number.Finally,in order to utilize the voice signal time domain information,time domain loss was introduced into the loss function.The effectiveness of the algorithm  was verified with the TIMIT data set and the VCTK data set.The experimental results show that,compared with the current mainstream algorithms,the new algorithm can improve the high-bandwidth speech quality,showing better hearing effect.

Key words: speech bandwidth extension, tile layer, time-frequency two-axis feature extraction, time-frequency loss, network depth

中图分类号: