Music Source Separation Method Based on Unet Combining SE and BiSRU

ZHANG Ruifeng, BAI Jintong, GUAN Xin, et al

doi:10.12141/j.issn.1000-565X.200593

Journal of South China University of Technology(Natural Science) >

2021 , Vol. 49 >Issue 11: 106 - 115,134

DOI: https://doi.org/10.12141/j.issn.1000-565X.200593

Electronics, Communication & Automation Technology

Music Source Separation Method Based on Unet Combining SE and BiSRU

Expand

School of Microelectronics，Tianjin University，Tianjin 300072，China

张瑞峰（1974-），男，博士，副教授，主要从事机器视觉与音频处理研究。E-mail：zhangruifeng@tju.edu.cn

Received date: 2020-09-30

Revised date: 2020-12-29

Online published: 2021-01-11

Supported by

Supported by the National Natural Science Foundation of China (61471263) and the Natural Science Foundation of Tianjin (16JCZDJC31100)

Fold

Abstract

Music source separation is one of the most important research topics in the field of music information retrieval.Traditional music source separation methods have shortcomings，such as hypothesis dependence，limited model complexity，and poor representation ability.To resolve these problems，it takes a long time to train the time-domain end-to-end deep learning network model，and the separation performance still needs to be improved.Therefore，in order to further optimize the representation ability and computational efficiency of the time domain end-to-end separation model，the study proposed an end-to-end network Unet-SE-BiSRU based on the Demucs model which has the best performance in time domain separation at present.Attention mechanism was introduced into the generalized coding layer and decoding layer，and the squeezing-excitation block(SE) was used to extract features selectively according to the type of audio to be separated.To deal with gradient explosion or disappearance that may occur in the learning process，a group normalization was added after one-dimensional con-volution.The bidirectional long short-term memory network was refined to a bidirectional simple recurrent unit(BiSRU)，which improves the parallelism of learning and reduces the amount of model parameters.The experimental results show that the signal-noise ratio of the improved network model is improved by 0.34dB，which is the best one among the time-domain end-to end methods to the best of our knowledge，and the training time is reduced by 3/5.

Key words： music source separation; Unet; time domain end-to-end separation model; simple recurrent unit; squeeze-and-excitation; group normalization

Cite this article

ZHANG Ruifeng, BAI Jintong, GUAN Xin, et al . Music Source Separation Method Based on Unet Combining SE and BiSRU[J]. Journal of South China University of Technology(Natural Science), 2021 , 49(11) : 106 -115,134 . DOI: 10.12141/j.issn.1000-565X.200593

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article