Journal of South China University of Technology(Natural Science Edition) ›› 2025, Vol. 53 ›› Issue (9): 1-10.doi: 10.12141/j.issn.1000-565X.250134

• Computer Science & Technology •     Next Articles

CODS: An Audio-Text Aligned Dataset for Cantonese Opera Vocal Synthesis

LI Yue1  HUANG Yihan1  PENG Zhengwei2  XIE Jixuan1   DU Yuye1   

  1. 1. School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, Guangdong, China;

    2. School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou 510006, Guangdong, China

  • Online:2025-09-25 Published:2025-05-20

Abstract:

As one of the traditional Chinese arts, Chinese opera culture has unique musical expressiveness. Cantonese opera is one of the main Chinese opera genres and an important carrier of Lingnan culture. It is listed in the World Intangible Cultural Heritage List. In recent years, generative artificial intelligence technology has demonstrated its powerful capabilities in the field of content creation. For example, singing synthesis technology can synthesize natural singing based on specified music scores, which provides a new idea for the digital protection and innovation of Cantonese opera. However, the collection and organization of opera data faces problems such as poor audio quality and complex dialect annotation, resulting in an extreme shortage of high-quality opera data sets. Based on this, this paper applies the singing synthesis technology in the field of pop music to the field of Cantonese opera vocal synthesis, and proposes the first Cantonese opera vocal synthesis dataset with phoneme-level annotation and audio-text alignment. First, this paper constructs the CODS dataset through a systematic process. This dataset is derived from 29 original works by four famous performers with a total length of 3.81 hours, which provides important support for the research and digitization of Cantonese opera. Secondly, this paper conducts sufficient experiments on this dataset and realizes the synthesis of Cantonese opera vocals with controllable lyrics, timbre and melody. In addition, this paper established a Cantonese opera vocal synthesis evaluation scheme and verified the usability of the produced dataset based on subjective and objective evaluation. The CODS dataset proposed in this paper successfully filled the gap in artificial intelligence in the field of Cantonese opera vocal synthesis, and strongly promoted the inheritance and innovation of this traditional art.

Key words: Cantonese opera, generative artificial intelligence, dataset, voice synthesis