Quantum-Inspired Optimal Fractional-Order Spectrogram for Speech Recognition Enhancement
1.School of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, Fujian, China;
2. School of Zhicheng College, Fuzhou University, Fuzhou 350002, Fujian, China;
3. School of Music, Fujian Normal University, Fuzhou 350108, Fujian, China
Online published: 2025-12-22
To address the insufficient resolution and weak feature discriminability of conventional spectrograms in time-frequency representation of speech signals, this paper proposes an optimal fractional-order spectrogram generation method based on a quantum-inspired Newton-Raphson algorithm to enhance the performance of speech recognition and classification tasks. First, quantum encoding is employed to initialize the population in the Newton-Raphson algorithm. Quantum rotation gates guide individuals toward the optimal direction, while quantum mutation and catastrophe operations are introduced to maintain population diversity and prevent premature convergence. Furthermore, the integration of simulated annealing and Lévy flight strategies enhances the algorithm's global search capability. Next, after windowing and framing the audio signal, fractional-order spectrograms are generated via the fractional Fourier transform and subsequently compressed into fractional-order Mel spectrograms using Mel filters. Finally, a tunable fractional-order parameter α is introduced to extend the representation flexibility of signals in the time-frequency domain. By minimizing information entropy as the objective function, the quantum-inspired Newton-Raphson algorithm adaptively optimizes hyperparameters including α, frame length, and frame shift, thereby obtaining the optimal fractional-order spectrogram. Experimental simulations are conducted on the 2022 CEC benchmark functions, as well as public datasets RAVDESS for emotion recognition and UrbanSound8K for sound classification, along with a self-collected dataset of sung vowel phonations. Results demonstrate that the proposed quantum-inspired Newton-Raphson algorithm exhibits superior global optimization capability and higher stability in solving high-dimensional complex problems compared to existing optimization algorithms. The generated optimal fractional-order spectrograms effectively concentrate signal energy and enhance feature separability, significantly outperforming traditional speech feature extraction methods in terms of accuracy, recall, and F1-score for speech recognition. This work provides a novel approach for high-precision feature extraction of complex speech signals, effectively improving speech recognition performance with strong robustness and promising applicability.
SUN Lei, ZHANG Xianheng, LIAO Yipeng, et al . Quantum-Inspired Optimal Fractional-Order Spectrogram for Speech Recognition Enhancement[J]. Journal of South China University of Technology(Natural Science), 0 : 1 . DOI: 10.12141/j.issn.1000-565X.250324
/
| 〈 |
|
〉 |