Journal of South China University of Technology(Natural Science Edition) ›› 2025, Vol. 53 ›› Issue (9): 1-10.doi: 10.12141/j.issn.1000-565X.250134

• Computer Science & Technology • Previous Articles     Next Articles

CODS: An Audio-Text Aligned Dataset for Cantonese Opera Vocal Synthesis

LI Yue1, HUANG Yihan1, PENG Zhengwei2, XIE Jixuan1, DU Yuye1   

  1. 1.School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006,Guangdong,China
    2.School of Computer Science and Engineering,Sun Yat-sen University,Guangzhou 510006,Guangdong,China
  • Received:2025-05-06 Online:2025-09-25 Published:2025-05-20
  • About author:李粤(1974—),女,博士,副教授,主要从事人工智能、数据挖掘、计算机科普研究。E-mail:liyue@scut.edu.cn
  • Supported by:
    the National Natural Science Foundation of China(62476096)

Abstract:

As one of the traditional Chinese arts, Chinese opera culture has unique musical expressiveness. Cantonese opera, as one of the main Chinese opera genres and an important carrier of Lingnan culture, has been indexed in the World Intangible Cultural Heritage List. In recent years, generative artificial intelligence technology has demonstrated its powerful capabilities in the field of content creation. For example, singing synthesis techno-logy can synthesize natural singing based on specified music scores. This provides a new idea for the digital protection and innovation of Cantonese opera. However, the collection and organization of opera data faces problems such as poor audio quality and complex dialect annotation, resulting in an extreme shortage of high-quality opera data sets. Based on this, this paper applied the singing synthesis technology in the field of pop music to the field of Cantonese opera vocal synthesis, and proposed the first Cantonese opera vocal synthesis dataset with phoneme-level annotation and audio-text alignment. Firstly, this paper constructed the CODS dataset through a systematic process. This dataset was derived from 29 original works by four famous performers with a total length of 3.81 hours, which provides important support for the research and digitization of Cantonese opera. Using this dataset, this paper conducted experiments with a deep learning-based method for Cantonese opera voice synthesis, realizing controllable generation in terms of lyrics, timbre, and melody. Finally, this paper established a comprehensive evaluation framework for Cantonese opera synthesis. Both objective and subjective evaluations reached a satisfactory level within the domain, further validating the usability of the proposed dataset. The CODS dataset constructed in this paper successfully filled the gap in artificial intelligence in the field of Cantonese opera vocal synthesis, and strongly promoted the inheritance and innovation of this traditional art.

Key words: Cantonese opera, generative artificial intelligence, dataset, voice synthesis

CLC Number: