Journal of South China University of Technology(Natural Science Edition) ›› 2025, Vol. 53 ›› Issue (7): 70-79.doi: 10.12141/j.issn.1000-565X.240508

• Electronics, Communication & Automation Technology • Previous Articles     Next Articles

Acoustic Scene Classification Method Based on Reducing High-Frequency Reverberation and RF-DRSN-EMA

CAO Yi, WANG Yanwen, LI Jie, ZHENG Zhi, SUN Hao   

  1. School of Mechanical Engineering/ Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment and Technology,Jiangnan University,Wuxi 214122,Jiangsu,China
  • Received:2024-10-14 Online:2025-07-25 Published:2025-01-17
  • About author:曹毅(1974—),男,博士,教授,主要从事语音识别技术研究。E-mail: caoyi@jiangnan.edu.cn
  • Supported by:
    the National Natural Science Foundation of China(52175234);the Programme of Introducing Talents of Discipline to Universities(B18027)

Abstract:

To address the issues of low classification accuracy and poor generalization in existing acoustic scene classification methods, this paper proposed a novel acoustic scene classification method based on reducing high-frequency reverberation and a frequency-domain residual shrinkage network with multi-scale attention, named RF-DRSN-EMA. Firstly, according to the principle of reducing sound reverberation, this paper introduced a redu-cing high-frequency reverberation method. This method attenuated only the high-frequency reverberation while preserving essential frequency information in other bands. As a result, speech intelligibility was enhanced, and the impact of speech distortion was minimized. Secondly, based on the deep residual shrinkage network, the proposed RF-DRSN-EMA integrates an improved frequency-domain self-calibration mechanism and a multi-scale attention module. The network used RF self-calibration module with a long-short residual structure to mitigate feature collapse, enabling efficient extraction of frequency-domain information. A multi-scale attention module was then applied at the output of each unit to highlight relevant information, further enhancing the model’s representation capacity. Finally, the proposed method is evaluated on three benchmark datasets: ESC-10, UrbanSound8K, and DCASE2020 Task 1A. The results show that the proposed high-frequency reverberation reduction method effectively suppresses high-frequency reverberation and background noise while eliminating redundant features, resulting in minimal speech quality degradation. The RF-DRSN-EMA network achieves efficient frequency-domain denoising and feature extraction, reaching classification accuracies of 98.00%, 93.42%, and 72.80% on the three datasets, respectively. These results confirm the effectiveness and generalizability of the proposed method.

Key words: acoustic scene classification, reducing high-frequency reverberation, frequency-domain residual shrinkage network, multi-scale attention, speech enhancement

CLC Number: