| [1] |
COOLEY J W, TUKEY J W .An algorithm for the machine calculation of complex Fourier series[J].Ma-thematics of Computation,1965,19(90):297-301.
|
| [2] |
OPPENHEIM A V .Discrete-time signal processing[M].[S. l.]:Pearson Education India,1999.
|
| [3] |
JÄHNE B .Digital image processing[M].Berlin:SpringerScience & Business Media,2005.
|
| [4] |
DUHAMEL P, VETTERLI M .Fast Fourier transforms: a tutorial review and a state of the art[J].Signal Processing,1990,19(4):259-299.
|
| [5] |
ZHANG Y, LI X .Fast convolutional neural networks with fine-grained FFTs[C]∥Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques.New York:Association for Computing Machinery,2020:255-265.
|
| [6] |
TOLMACHEV D .VkFFT—a performant,cross-platform and open-source GPU FFT library[J].IEEE Access,2023,11:12039-12058.
|
| [7] |
HU Y, LU L, LI C .Memory-accelerated parallel method for multidimensional fast Fourier implementation on GPU[J].The Journal of Supercomputing,2022,78(16):18189-18208.
|
| [8] |
LEE J, KIM D .Large-scale 3D fast Fourier transform computation on a GPU[J].ETRI Journal,2023,45(6):1035-1045.
|
| [9] |
ZHAO Y, LIU F, MA W,et al .MFFT: a GPU accelerated highly efficient mixed-precision large-scale FFT framework[J].ACM Transactions on Architecture and Code Optimization,2023,20(3):1-23.
|
| [10] |
MARKIDIS S, DER CHIEN S W, LAURE E,et al .NVIDIA tensor core programmability,performance & precision[C]∥Proceedings of 2018 IEEE International Parallel and Distributed Processing Symposium Workshops.Vancouver:IEEE,2018:522-531.
|
| [11] |
SORNA A, CHENG X, D’ AZEVEDO E,et al .Optimizing the fast Fourier transform using mixed precision on tensor core hardware[C]∥Proceedings of 2018 IEEE 25th International Conference on High Performance Computing Workshops.Bengaluru:IEEE,2018:3-7.
|
| [12] |
CHENG X, SORNA A, D’ AZEVEDO E,et al .Accelerating 2D FFT:exploit GPU tensor cores through mixed-precision[C]∥Proceedings of the International Conference for High Performance Computing,Networking,Storage,and Analysis.Dallas:ACM,2018.
|
| [13] |
DURRANI S, CHUGHTAI M S, HIDAYETOGLU M,et al .Accelerating Fourier and number theoretic transforms using tensor cores and warp shuffles[C]∥Proceedings of 2021 30th International Conference on Pa-rallel Architectures and Compilation Techniques.Atlanta:IEEE,2021:345-355.
|
| [14] |
PISHA L, LIGOWSKI Ł .Accelerating non-power-of-2 size Fourier transforms with GPU tensor cores[C]∥Proceedings of 2021 IEEE International Parallel and Distributed Processing Symposium.Portland:IEEE,2021:507-516.
|
| [15] |
LI B, CHENG S, LIN J .TcFFT:a fast half-precision FFT library for NVIDIA tensor cores[C]∥Proceedings of 2021 IEEE International Conference on Cluster Computing.Portland:IEEE,2021:1-11.
|
| [16] |
JOUPPI N P, YOUNG C, PATIL N,et al .In-datacenter performance analysis of a tensor processing unit[C]∥Proceedings of the 44th Annual International Symposium on Computer Architecture.New York:Association for Computing Machinery,2017:1-12.
|
| [17] |
ABADI M, BARHAM P, CHEN J,et al .TensorFlow:a system for large-scale machine learning[C]∥Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation.Savannah:USENIX Association,2016:265-283.
|
| [18] |
LU T, CHEN Y F, HECHTMAN B,et al .Large-scale discrete Fourier transform on TPUs[J].IEEE Access,2021,9:93422-93432.
|
| [19] |
LU T, MARIN T, ZHUO Y,et al .Nonuniform fast Fourier transform on TPUs[C]∥Proceedings of 2021 IEEE 18th International Symposium on Biomedical Imaging.Nice:IEEE,2021:783-787.
|
| [20] |
LI Q, ZUO D, FENG Y,et al .Research on high-performance Fourier transform algorithms based on the NPU[J].Applied Sciences,2024,14(1):405/1-19.
|