Journal of South China University of Technology(Natural Science Edition)

• Computer Science & Technology • Previous Articles     Next Articles

Design and Optimization of Fast Fourier Transform Based on Ascend NPU

LU Lu1 WANG Yuanfei1 LIANG Zhihong2 SUO Siliang2   

  1. 1. School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, Guangdong, China;

    2. Electric Power Research Institute, CSG/Guangdong Provincial Key Laboratory of Power System Network Security, Guangzhou 510663, Guangdong, China

  • Published:2025-02-28

Abstract:

Fast Fourier Transform (FFT), as a fundamental algorithm in scientific computing and signal processing, has been widely applied in digital signal processing, image processing, deep learning, and various other fields. With the growing data scale and increasing processing demands, optimizing FFT algorithms on emerging hardware platforms has become increasingly crucial. This research first conducts an in-depth analysis of Ascend NPU architectural characteristics and their implications for FFT optimization. Based on the matrix-computation-based Stockham FFT algorithm, a series of innovative optimization strategies are proposed: (1) develops a heuristic radix selection algorithm to provide effective radix sequence combinations for different input sizes; (2) designs an efficient computation flow for single-iteration FFT that eliminates the need for real-imaginary separation, significantly reducing global memory access overhead; (3) presents an on-chip cache-based data reading optimization strategy, substantially improving data access speed; (4) implements a data layout optimization method for multiple iterations, effectively enhancing overall memory access efficiency. Experimental results on the Huawei Ascend Atlas 800 platform equipped with Ascend 910 AI processor demonstrate that the proposed optimization strategies achieve an average 4.61-fold speedup compared to non-optimized implementations. Through independent performance analysis and validation of each optimization strategy, results show significant individual speedups ranging from 1.42× to 3.52×, providing valuable technical references for implementing efficient FFT algorithms on emerging NPU architectures.

Key words: fast Fourier transform, ascend NPU, heterogeneous computing, high-performance computing