华南理工大学学报(自然科学版) ›› 2025, Vol. 53 ›› Issue (9): 1-10.doi: 10.12141/j.issn.1000-565X.250134

• 计算机科学与技术 •    下一篇

CODS:用于粤剧人声合成的音频-文本对齐数据集

李粤 1 黄奕翰1 彭郑威2  谢吉轩1 杜宇烨1   

  1. 1. 华南理工大学 计算机科学与工程学院,广东 广州 510006;

    2. 中山大学 计算机学院,广东 广州 510006

  • 出版日期:2025-09-25 发布日期:2025-05-20

CODS: An Audio-Text Aligned Dataset for Cantonese Opera Vocal Synthesis

LI Yue1  HUANG Yihan1  PENG Zhengwei2  XIE Jixuan1   DU Yuye1   

  1. 1. School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, Guangdong, China;

    2. School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou 510006, Guangdong, China

  • Online:2025-09-25 Published:2025-05-20

摘要:

中国戏曲文化作为中国传统艺术之一,具有独特的音乐表现力。粤剧是中国主要戏曲剧种之一,是岭南文化的重要载体,被列入世界非物质文化遗产名录。近年来,生成式人工智能技术展现了其在内容创作领域的强大能力,如歌声合成技术能够根据指定乐谱合成自然的歌声,这为粤剧的数字化保护与创新提供了全新思路。然而,戏曲数据的收集与整理面临音频质量不佳、方言标注复杂等问题,导致高质量戏曲数据集极为匮乏。基于此,该文将流行音乐领域的歌声合成技术应用到粤剧人声合成领域,并提出了首个音素级标注的音频-文本对齐的粤剧人声合成数据集。首先,该文通过系统化的流程构建了CODS数据集。该数据集源自四位著名表演者的29部原创作品,总时长为3.81小时,为粤剧研究和数字化提供了重要支持。其次,该文在该数据集上进行了充分的实验,实现了歌词、音色和旋律可控的粤剧人声合成。此外,该文建立了一套粤剧人声合成评估方案,基于主客观评价验证了所制作数据集的可用性。该文提出的CODS数据集成功填补了人工智能在粤剧人声合成领域的空白,有力推动了这一传统艺术的传承与创新。

关键词: 粤剧, 生成式人工智能, 数据集, 人声合成

Abstract:

As one of the traditional Chinese arts, Chinese opera culture has unique musical expressiveness. Cantonese opera is one of the main Chinese opera genres and an important carrier of Lingnan culture. It is listed in the World Intangible Cultural Heritage List. In recent years, generative artificial intelligence technology has demonstrated its powerful capabilities in the field of content creation. For example, singing synthesis technology can synthesize natural singing based on specified music scores, which provides a new idea for the digital protection and innovation of Cantonese opera. However, the collection and organization of opera data faces problems such as poor audio quality and complex dialect annotation, resulting in an extreme shortage of high-quality opera data sets. Based on this, this paper applies the singing synthesis technology in the field of pop music to the field of Cantonese opera vocal synthesis, and proposes the first Cantonese opera vocal synthesis dataset with phoneme-level annotation and audio-text alignment. First, this paper constructs the CODS dataset through a systematic process. This dataset is derived from 29 original works by four famous performers with a total length of 3.81 hours, which provides important support for the research and digitization of Cantonese opera. Secondly, this paper conducts sufficient experiments on this dataset and realizes the synthesis of Cantonese opera vocals with controllable lyrics, timbre and melody. In addition, this paper established a Cantonese opera vocal synthesis evaluation scheme and verified the usability of the produced dataset based on subjective and objective evaluation. The CODS dataset proposed in this paper successfully filled the gap in artificial intelligence in the field of Cantonese opera vocal synthesis, and strongly promoted the inheritance and innovation of this traditional art.

Key words: Cantonese opera, generative artificial intelligence, dataset, voice synthesis