华南理工大学学报(自然科学版) ›› 2011, Vol. 39 ›› Issue (4): 21-25.doi: 10.3969/j.issn.1000-565X.2011.04.004

• 计算机科学与技术 • 上一篇    下一篇

基于时间序列聚类和ARMA 模型的检索量预测

孙承杰 刘丰 林磊 刘秉权   

  1. 哈尔滨工业大学 计算机科学与技术学院,黑龙江 哈尔滨 150001
  • 收稿日期:2011-01-10 出版日期:2011-04-25 发布日期:2011-03-01
  • 通信作者: 孙承杰(1980-) ,男,博士,讲师,主要从事文本挖掘研究. E-mail:cjsun@ insun.hit.edu.cn
  • 作者简介:孙承杰(1980-) ,男,博士,讲师,主要从事文本挖掘研究.
  • 基金资助:

    国家自然科学基金资助项目( 60973076,61073127) ; 哈尔滨工业大学中央高校基本科研业务费专项资金资助项目( HIT.NSRIF.2010045)

Prediction of Search Data Volume Based on Time-Series Clustering and ARMA Models

Sun Cheng-jie  Liu Feng  Lin Lei  Liu Bing-quan   

  1. School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,Heilongjiang,China
  • Received:2011-01-10 Online:2011-04-25 Published:2011-03-01
  • Contact: 孙承杰(1980-) ,男,博士,讲师,主要从事文本挖掘研究. E-mail:cjsun@ insun.hit.edu.cn
  • About author:孙承杰(1980-) ,男,博士,讲师,主要从事文本挖掘研究.
  • Supported by:

    国家自然科学基金资助项目( 60973076,61073127) ; 哈尔滨工业大学中央高校基本科研业务费专项资金资助项目( HIT.NSRIF.2010045)

摘要: 为了通过预测分析检索量数据来指导商家调整产品开发及经营策略,将检索量数据组织为时间序列,对其用自回归滑动平均( ARMA) 模型进行建模预测.先将时间序列进行聚类,仅对聚类中心序列进行ARMA 模型识别,同类序列用该模型进行近似建模预测; 经过数据预处理、相似性分析、基于相似度的聚类、时间序列预测等过程,得到检索量数据的预测值,并将其与检索量的实际值做比较.结果表明,用同一个ARMA 模型拟合相似时间序列的方法具有可行性,且有较高的预测准确率.从聚类结果还可看出,同品牌产品的检索量数据趋于聚成一类,这为检索词关系的挖掘提供了参考.

关键词: 时间序列, 检索量, ARMA 模型, 动态时间弯曲距离, k-medoid 算法

Abstract:

In order to guide the adjustment of product development and business strategy by predicting and analyzing the search data volume,the data of search volume are organized into time series that is modeled and predicted using the autoregressive moving average ( ARMA) models. Then,the set of time series is modeled by clustering; the cluster centers are modeled using ARMA models; and the same-class series is fitted with the models approximately to obtain the predicted values. Moreover,after such operations as data preprocessing,similarity analysis,similarity-based clustering and time-series prediction,the search data volume is predicted and is compared with the actual one. Experimental results show that it is feasible and accurate to model similar time series with the same ARMA model. In addition,clustering results indicate that the search data volume of the products with the same brand tends to be clustered together,which provides a reference for the relationship mining of search terms.

Key words: time series, search data volume, ARMA model, dynamic time-warping distance, k-medoid algorithm