Journal of South China University of Technology(Natural Science Edition) ›› 2025, Vol. 53 ›› Issue (9): 1-10.doi: 10.12141/j.issn.1000-565X.250134
• Computer Science & Technology • Previous Articles Next Articles
LI Yue1, HUANG Yihan1, PENG Zhengwei2, XIE Jixuan1, DU Yuye1
Received:2025-05-06
Online:2025-09-25
Published:2025-05-20
About author:李粤(1974—),女,博士,副教授,主要从事人工智能、数据挖掘、计算机科普研究。E-mail:liyue@scut.edu.cn
Supported by:CLC Number:
LI Yue, HUANG Yihan, PENG Zhengwei, XIE Jixuan, DU Yuye. CODS: An Audio-Text Aligned Dataset for Cantonese Opera Vocal Synthesis[J]. Journal of South China University of Technology(Natural Science Edition), 2025, 53(9): 1-10.
Table 3
Frequency distribution of phoneme"
| 音素 | 频次 | 音素 | 频次 | 音素 | 频次 | 音素 | 频次 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| s | 1 514 | k | 136 | ak | 65 | ong | 337 | ||||||
| m | 973 | kw | 24 | e | 1 365 | ok | 141 | ||||||
| j | 1 367 | aa | 4 782 | ei | 514 | u | 1 623 | ||||||
| c | 874 | aai | 151 | eng | 36 | ui | 101 | ||||||
| l | 880 | aau | 54 | i | 4 649 | un | 104 | ||||||
| z | 920 | aam | 69 | iu | 398 | ung | 498 | ||||||
| h | 758 | aan | 356 | im | 85 | uk | 194 | ||||||
| f | 489 | aang | 101 | in | 456 | oe | 758 | ||||||
| g | 756 | aak | 77 | ing | 599 | oeng | 446 | ||||||
| w | 386 | ai | 389 | ip | 34 | oek | 50 | ||||||
| n | 446 | au | 530 | it | 80 | ng | 2 799 | ||||||
| t | 402 | am | 336 | ik | 106 | eoi | 368 | ||||||
| b | 544 | an | 552 | o | 3 244 | eon | 82 | ||||||
| d | 476 | ang | 133 | oi | 404 | yu | 977 | ||||||
| gw | 159 | ap | 57 | ou | 407 | yun | 233 | ||||||
| p | 147 | at | 355 | on | 66 | yut | 98 | ||||||
| [1] | XU J .The language features and cultural implication of Cantonese opera librettos[J].Frontiers in Art Research,2021,3(2):20-29. |
| [2] | 单韵鸣,杜金凤 .地方非物质文化遗产的传播困境与现代化发展模式的探索:以粤剧为例[J].学术研究,2024(8):54-60. |
| SHAN Yunming, DU Jinfeng .The promotion issues and the modemization model of the regional intangible cultural heritage:a case study of cantonese opem[J].Academic Research,2024(8):54-60. | |
| [3] | ZHANG L, LI R, WANG S,et al .M4Singer:a multi-style,multi-singer and musical score provided mandarin singing corpus[C]∥ Proceedings of Advances in Neural Information Processing Systems.New Orleans:MIT Press,2022:6914-6926. |
| [4] | ZHANG Z, ZHENG Y, LI X,et al .WeSinger 2:fully parallel singing voice synthesis via multi-singer conditional adversarial training[C]∥ Proceedings of 2023 IEEE International Conference on Acoustics,Speech and Signal Processing.Rhodes Island:IEEE,2023:1-5. |
| [5] | WANG C, ZENG C, HE X .Xiaoicesing 2:a high-fidelity singing voice synthesizer based on generative adversarial network[EB/OL].(2022-10-26)[2025-05-05].. |
| [6] | LIU X, ZHANG W, ZHENG Z,et al .FGP-GAN:fine-grained perception integrated generative adversarial network for expressive mandarin singing voice synthesis [J].IEEE Transactions on Consumer Electronics,2024,70(3):6054-6063. |
| [7] | CUI J, GU Y, WENG C,et al .Sifisinger:a high-fidelity end-to-end singing voice synthesizer based on source-filter model[C]∥ Proceedings of 2024 IEEE International Conference on Acoustics,Speech and Signal Processing.Seoul:IEEE,2024:11126-11130. |
| [8] | LIU J, LI C, REN Y,et al .DiffSinger:singing voice synthesis via shallow diffusion mechanism[C]∥ Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence.Vancouver:AAAI,2022:11020-11028. |
| [9] | REPETTO R C, SERRA X .A collection of music scores for corpus based jingju singing research[C]∥ Proceedings of the 18th International Society for Music Information Retrieval Conference.Suzhou:ISMIR,2017:46-52. |
| [10] | ZHENG M, BAI P, SHI X,et al .FT-GAN:fine-grained tune modeling for Chinese opera synthesis[C]∥ Proceedings of the Thirty-Eighth AAAI Confe-rence on Artificial Intelligence.Vancouver:AAAI,2024:19697-19705. |
| [11] | BOŽIĆ M, HORVAT M .A survey of deep learning audio generation methods[EB/OL].(2024-05-31)[2025-05-05].. |
| [12] | KIM J, KONG J,SON J .Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech[C]∥ Proceedings of the 38th International Conference on Machine Learning.Vienna:ML Research Press,2021:5530-5540. |
| [13] | ZHANG Y, CONG J, XUE H,et al .VISinger:variational inference with adversarial learning for end-to-end singing voice synthesis[C]∥ Proceedings of 2022 IEEE International Conference on Acoustics,Speech and Signal Processing.Singapore:IEEE,2022:7237-7241. |
| [14] | ZHANG Y, XUE H, LI H,et al .VISinger 2:high-fidelity end-to-end singing voice synthesis enhanced by digital signal processing synthesizer[EB/OL].(2022-11-05)[2025-05-05].. |
| [15] | HWANG J S, LEE S H, LEE S W .HiddenSinger:high-quality singing voice synthesis via neural audio codec and latent diffusion models[J].Neural Networks,2025,181:106762/1-10. |
| [16] | ZHANG Y, HUANG R, LI R,et al .StyleSinger:style transfer for out-of-domain singing voice synthesis[C]∥ Proceedings of the Thirty-Eighth AAAI Confe-rence on Artificial Intelligence.Vancouver:AAAI,2024:19597-19605. |
| [17] | BLACK D A, LI M, TIAN M .Automatic identification of emotional cues in Chinese opera singing[C]∥ Proceedings of the 13th International Conference on Music Perception and Cognition.Seoul:[s.n.],2014:250-255. |
| [18] | ISLAM R, XU M, FAN Y .Chinese traditional opera database for music genre recognition[C]∥ Procee-dings of the 18th Oriental COCOSDA/CASLRE.Shanghai:IEEE,2015:38-41. |
| [19] | LI Y, PENG Z, XU D,et al .RoleNet:a multiple features fusion network for role classification in Canto-nese opera[J].Multimedia Tools and Applications,(2025-01-28).. |
| [20] | CHEN Q, ZHAO W, WANG Q,et al .The sustai-nable development of intangible cultural heritage with AI:Cantonese opera singing genre classification based on CoGCNet model in China[J].Sustainability,2022,14:2923/1-20. |
| [21] | LI Q, HU B .Joint time and frequency transformer for Chinese opera classification[C]∥ Proceedings of Interspeech 2023.Dublin:ISCA,2023:3919-3923. |
| [22] | WU Y, LI S, YU C,et al .Peking Opera synthesis via duration informed attention network[EB/OL].(2020-08-07)[2025-05-05].. |
| [23] | BAI P, ZHOU Y, ZHENG M,et al .Improving Chinese pop song and Hokkien Gezi Opera singing voice synthesis by enhancing local modeling[C]∥ Procee-dings of the 2023 Conference on Empirical Methods in Natural Language Processing.Singapore:ACL,2023:3302-3312. |
| [24] | ZHOU X, SUN W, SHI X .A high-quality melody-aware Peking Opera synthesizer using data augmentation[C]∥ Proceedings of 2023 IEEE International Confe-rence on Multimedia and Expo.Brisbane:IEEE,2023:1092-1097. |
| [25] | REN Y, HU C, TAN X,et al .FastSpeech 2:fast and high-quality end-to-end text to speech[EB/OL].(2020-06-08)[2025-05-05].. |
| [26] | MCAULIFFE M, SOCOLOF M, MIHUC S,et al .Montreal forced aligner:trainable text-speech alignment using kaldi[C]∥ Proceedings of Interspeech 2017.Stockholm:ISCA,2017:498-502. |
| [27] | SOLOVYEY R, STEMPKOVSKIY A, HABRUSEVA T .Benchmarks and leaderboards for sound demixing tasks[EB/OL].(2024-05-07)[2025-05-05].. |
| [28] | BOERSMA P .Praat:doing phonetics by computer [CP/OL].(2011-05-01)[2025-05-05].. |
| [29] | DAI S, WU Y, CHEN S,et al .SingStyle111:a multilingual singing dataset with style transfer[C]∥ Proceedings of the 24th International Society for Music Information Retrieval Conference.Milan:ISMIR,2023:765-773. |
| [30] | JADOUL Y, THOMPSON B, DE BOER B .Introdu-cing Parselmouth:a Python interface to Praat[J].Journal of Phonetics,2018,71:1-15. |
| [31] | LI R, ZHANG Y, WANG Y,et al .Robust singing voice transcription serves synthesis[EB/OL].(2024-05-16)[2025-05-05].. |
| [32] | 关子尹,邓伟生,赵子明 .粤语审音配词字库 [DB/OL].(2003-01-12)[2025-05-05].. |
| [33] | KONG J, KIM J,BAE J .HiFi-GAN:generative adversarial networks for efficient and high fidelity speech synthesis[C]∥ Advances in Neural Information Processing Systems 33:34th Conference on Neural Information Processing Systems.San Diego:Neural Information Processing Systems Foundation,Inc.,2020:17022-17033. |
| [34] | VASWANI A, SHAZEER N, PARMAR N,et al .Attention is all you need[C]∥ Proceedings of Advances in Neural Information Processing Systems.Long Beach:MIT Press,2017:6000-6010. |
| [35] | GULATI A, QIN J, CHIU C C,et al .Conformer:convolution-augmented transformer for speech recognition[EB/OL].(2020-05-16)[2025-05-05].. |
| [36] | CHEN J, TAN X, LUAN J,et al .Hifisinger:towards high-fidelity neural singing voice synthesis[EB/OL].(2020-09-03)[2025-05-05].. |
| [1] | HU Yongjian, ZHUO Sichao, LIU Beibei, WANG Yufei, LI Jicheng. Improvement of Cross-Dataset Performance of Face Forgery Detection Based on Multi-Scale Spatiotemporal Features and Tampering Probabilities [J]. Journal of South China University of Technology(Natural Science Edition), 2024, 52(6): 110-119. |
| [2] | YUAN Renteng, WANG Chenzhu, XIANG Qiaojun, ZHENG Ou, DING Shengxuan. Trajectory Data-Driven Model for Vehicle Lane Change Intention Recognition [J]. Journal of South China University of Technology(Natural Science Edition), 2024, 52(6): 34-44. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||