华南理工大学学报(自然科学版) ›› 2012, Vol. 40 ›› Issue (8): 82-87.

• 计算机科学与技术 • 上一篇    下一篇

基于时空情境模型的主题跟踪

周亦鹏1 杜军平2   

  1. 1. 北京工商大学 计算机与信息工程学院,北京 100048; 2. 北京邮电大学 智能通信软件与多媒体北京市重点实验室,北京 100876
  • 收稿日期:2012-06-01 修回日期:2012-07-23 出版日期:2012-08-25 发布日期:2012-07-01
  • 通信作者: 周亦鹏(1976-) ,男,博士,讲师,主要从事人工智能、Web 挖掘等研究. E-mail:yipengzhou@163.com
  • 作者简介:周亦鹏(1976-) ,男,博士,讲师,主要从事人工智能、Web 挖掘等研究.
  • 基金资助:

    国家自然科学基金资助项目( 91024001, 61070142) ; 北京市自然科学基金资助项目( 4111002)

Topic Tracking Based on Spatiotemporal Contextual Model

Zhou Yi-peng1  Du Jun-ping2   

  1. 1.School of Computer and Information Engineering,Beijing Technology and Business University,Beijing 100048,China; 2. Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia,Beijing University of Posts and Telecommunications,Beijing 100876,China
  • Received:2012-06-01 Revised:2012-07-23 Online:2012-08-25 Published:2012-07-01
  • Contact: 周亦鹏(1976-) ,男,博士,讲师,主要从事人工智能、Web 挖掘等研究. E-mail:yipengzhou@163.com
  • About author:周亦鹏(1976-) ,男,博士,讲师,主要从事人工智能、Web 挖掘等研究.
  • Supported by:

    国家自然科学基金资助项目( 91024001, 61070142) ; 北京市自然科学基金资助项目( 4111002)

摘要: 针对现有的主题模型不能随时空情境准确反映主题的周期变化和空间分布的问题,根据互联网信息通常包含发布时间地点等情境数据的特点,提出一种用于主题跟踪的时空情境主题模型. 首先将数据集的多主题分布与时空信息关联起来建立时空情境主题模型,对主题周期和强度进行描述; 然后通过EM 算法估计模型参数,利用该参数分别计算主题快照和主题周期; 最后利用时序相似度计算判断后续主题信息,实现主题跟踪. 食品安全事件主题跟踪的实验表明: 与单纯依赖文本特征的主题跟踪方法相比,文中提出的方法能够明显提高跟踪效率和多个主题的跟踪准确性,这有助于进一步实现精准的主题信息检索.

关键词: 主题模型, 情境, 生成模型, 概率分布, 文本处理

Abstract:

As the existing topic model can not accurately reflect the periodic variation and spatial distribution of topics in spatiotemporal context,a spatiotemporal contextual topic model for topic tracking is proposed according to the fact that the Internet information often contains the publishing time and site. In the investigation,first,by associating the distribution of subtopics with spatiotemporal context,a model is established to describe the cycle and strength of topics. Then,the parameters of the proposed model are estimated through EM algorithm,and are employed to obtain the snapshot and cycle of topics. Finally,the time-based topic similarity is calculated to estimate the subsequent topic information,thus realizing the topic tracking. The tracking experiments of food safety events indicate that,as compared with the traditional topic tracking method only depending on the text features,the proposed method can obviously improve the tracking efficiency of the topic as well as the tracking accuracy of subtopics. It is thus concluded that the proposed method helps to achieve more accurate topic retrieval.

Key words: topic model, context, generation model, probability distributions, text processing