Journal of South China University of Technology(Natural Science Edition) ›› 2025, Vol. 53 ›› Issue (7): 1-.doi: 10.12141/j.issn.1000-565X.240508

• Electronics, Communication & Automation Technology •    

Acoustic Scene Classification Method Based On Reducing High Frequency Reverberation And RF-DRSN-EMA

CAO Yi   WANG Yanwen   LI Jie   ZHENG Zhi   SUN Hao   

  1. School of Mechanical Engineering/ Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment and Technology, Wuxi 214122, Jiangsu, China

  • Online:2025-07-25 Published:2025-01-17

Abstract:

Aiming at the problems of low classification accuracy and weak generalization ability of existing methods for acoustic scene classification, this paper proposed an acoustic scene classification method based on reducing high frequency reverberation and frequency domain residual shrinkage network for multi-scale attention(RF-DRSN-EMA). Firstly, the underlying principles for reducing sound reverberation were presented, along with a proposed algorithm specifically reducing high-frequency reverberation. This algorithm effectively attenuated only the high-frequency reverberation while preserving essential frequency information in other bands. As a result, speech intelligibility was enhanced, and the impact of speech distortion was minimized. Secondly, based on the deep residual shrinkage network and combined with the improved frequency domain self-calibration algorithm and the multi-scale attention module, a frequency domain residual shrinkage network for multi-scale attention(RF-DRSN-EMA) was proposed. The model used RF self calibration block, whose internal long-distance and short-distance residual structure can alleviate feature collapse. In order to achieve efficient collection of frequency domain information, multi-scale attention module was used in the output of the unit, which can further focus on the effective information of the unit at the output layer, thus strengthening the representation ability of the model. Finally, the experiments were carried out based on ESC-10, Urbansound8K and DCASE2020 Task 1A data datasets. The experimental results showed that the speech enhancement method to reduce high-frequency reverberation can reduce the impact of background noise such as high frequency reverberation and eliminate redundant features, and the sound quality damage is small, thereby showing a better classification performance. At the same time, RF-DRSN-EMA realizes the typical feature denoising and efficient information collection in the frequency domain of the network, and the best classification accuracy of the model can reach 98.00%, 93.42% and 72.80%, respectively, which verifies the effectiveness and generalization of network.


Key words: acoustic scene classification, reduce high frequency reverberation, the frequency domain residual shrinkage network, multi-scale attention, speech enhancement