华南理工大学学报(自然科学版) ›› 2021, Vol. 49 ›› Issue (1): 18-28.doi: 10.12141/j.issn.1000-565X.200489

所属专题: 2021年计算机科学与技术

• 计算机科学与技术 • 上一篇    下一篇

重大突发疫情事件中的谣言识别

刘勘1 黄哲英2   

  1. 1. 中南财经政法大学 信息与安全工程学院,湖北 武汉 430073; 2. 南开大学 商学院,天津 300071
  • 收稿日期:2020-08-14 修回日期:2020-10-09 出版日期:2021-01-25 发布日期:2021-01-01
  • 通信作者: 刘勘 ( 1970-) ,男,博士,教授,主要从事机器学习和数据挖掘、社交网络与舆情分析研究。 E-mail:liukan@zuel.edu.cn
  • 作者简介:刘勘 ( 1970-) ,男,博士,教授,主要从事机器学习和数据挖掘、社交网络与舆情分析研究。
  • 基金资助:
    国家自然科学基金面上项目 ( 71573196) ; 中南财经政法大学中央高校基本科研业务费专项资金资助项目 ( 2722020JX007)

Rumor Identification in Major Sudden Epidemic Situation 

LIU Kan1 HUANG Zheying2   

  1. 1. School of Information and Safety Engineering,Zhongnan University of Economics and Law,Wuhan 430073,Hubei,China; 2. School of Business,Nankai University,Tianjin 300071,China
  • Received:2020-08-14 Revised:2020-10-09 Online:2021-01-25 Published:2021-01-01
  • Contact: 刘勘 ( 1970-) ,男,博士,教授,主要从事机器学习和数据挖掘、社交网络与舆情分析研究。 E-mail:liukan@zuel.edu.cn
  • About author:刘勘 ( 1970-) ,男,博士,教授,主要从事机器学习和数据挖掘、社交网络与舆情分析研究。
  • Supported by:
    Supported by the General Program of the National Natural Science Foundation of China ( 71573196)

摘要: 新冠疫情暴发以来,相关谣言时有传播,但传统的谣言识别模型却难以有效判 别疫情谣言,因为相较于大量历史谣言数据,疫情谣言的数量还不足以训练出良好的分 类器。因此,建立一个以少量谣言数据为基础的疫情谣言识别模型紧迫且重要。针对训 练数据量不足的问题,为了提高疫情谣言鉴别效果,文中提出了一种基于文本增强和生 成对抗网络 ( GAN) 的疫情谣言识别方法。首先,分析疫情谣言的文本特征,提取能 表征疫情谣言的特征词; 然后,基于 GAN 构建疫情谣言生成模型,将不含疫情谣言特 征的历史谣言,利用疫情谣言特征词库进行文本增强,并生成大量含有疫情谣言特征的 新谣言数据; 最后,在疫情谣言中补充新生成的谣言数据,从而训练出更准确的疫情谣 言分类模型。实验结果表明,使用 GAN 扩充训练集后,识别效果提高了 3 个百分点, 明显优于传统机器学习和深度学习算法,为重大突发疫情事件中谣言的识别提供了新的 途径。

关键词: 新冠疫情, 谣言识别, 生成模型, 文本增强

Abstract: Since the outbreak of the covid-19 epidemic,related rumors have spread rampantly. Traditional rumor identification models have difficulties in epidemic rumor identification because the size of epidemic rumors is not large enough to train a good classification and identification model. Therefore,it is an urgent task to build a rumor identification model based on a small amount of epidemic rumor data. To deal with the problem of insufficient training data,text enhancement and generative adversarial networks ( GAN) methods were used to generate a large amount of information similar to epidemic rumors and to improve the identification effect of epidemic rumors. First, the textual characteristics was analyzed to extract keyword of epidemic rumors. Second,epidemic rumor generation model was constructed based on the idea of GAN,and historical rumors which do not contain epidemic rumor features were textually enhanced by the epidemic rumor feature thesaurus,and a large amount of new rumor data containing epidemic rumor features were generated. Finally,the newly generated rumor data are combined with the epidemic rumor data to train a more accurate classification model of the epidemic rumor. Experiment results show that the rumor identification effect is improved by 3% after using the GAN extended training set. The new model is evidently much better than the traditional machine learning and deep learning algorithms,and it provides a new way for the identification of rumors in public health emergency.

Key words: covid-19 epidemic, rumor identification, generation model, text enhancement

中图分类号: