计算机科学与技术

基于 OLDA 的热点话题演化跟踪模型

展开
  • 四川大学 计算机学院//网络与可信计算研究所,四川 成都 610065
陈兴蜀(1969-),女,教授,博士生导师,主要从事信息安全、云计算安全研究. E-mail:chenxsh@ scu. edu. cn

收稿日期: 2015-02-11

  修回日期: 2015-08-21

  网络出版日期: 2016-04-12

基金资助

国家科技支撑计划项目(2012BAH18B05);国家自然科学基金资助项目(61272447)

OLDA-Based Model for Hot Topic Evolution and Tracking

Expand
  • College of Computer//Network and Trusted Computing Institute,Sichuan University,Chengdu 610065,Sichuan,China
陈兴蜀(1969-),女,教授,博士生导师,主要从事信息安全、云计算安全研究. E-mail:chenxsh@ scu. edu. cn

Received date: 2015-02-11

  Revised date: 2015-08-21

  Online published: 2016-04-12

Supported by

Supported by the National Science and Technology Support Program of China(2012BAH18B05)and the National Natural Science Foundation of China(61272447)

摘要

为了发现论坛数据中感兴趣的话题并对话题进行演化跟踪,文中首先利用潜在狄利克雷分配(LDA)模型将文本由词汇空间降维到主题空间,然后采用聚类算法在主题空间对文本集进行聚类,并利用文中提出的热点话题检测方法得出热点话题. 基于发现的热点话题,文中提出了基于在线 LDA(OLDA)话题模型的论坛热点话题演化跟踪模型(HTOLDA),该模型只选择热点话题进行先验传递,并通过设置同一话题相邻时间片的语义距离来判断话题的状态. 实验结果表明,HTOLDA 模型对各个时间片的论坛数据集的建模能力优于 OLDA 模型,并能够有效地对论坛中的热点话题进行演化跟踪.

本文引用格式

陈兴蜀 高悦 江浩 杜敏 王海舟 何建云 . 基于 OLDA 的热点话题演化跟踪模型[J]. 华南理工大学学报(自然科学版), 2016 , 44(5) : 130 -136 . DOI: 10.3969/j.issn.1000-565X.2016.05.020

Abstract

In order to detect and track interesting topics from massive forum data,firstly,LDA (Latent Dirichlet Allocation) topic model is used to reduce the dimensionality of text data from word space to semantic space.Seco- ndly,a clustering algorithm is employed to cluster the forum data in semantic space.Then,a detection method is proposed to obtain hot topics on the basis of which HTOLDA (Hot-Topic OLDA) topic model is proposed on the basis of OLDA (Online LDA) topic model,which performs priori delivery by choosing hot topics and sets semantic distance on the same topic of adjacent time slices to judge topic status.Experimental results show that HTOLDA topic model is superior to OLDA topic model in terms of modeling each time slice,and that it evolves and tracks the hot topics in online forums effectively.
文章导航

/