华南理工大学学报(自然科学版) ›› 2009, Vol. 37 ›› Issue (1): 91-95,112.

• 计算机科学与技术 • 上一篇    下一篇

基于MapReduce的封闭立方体并行计算方法

奚建清 游进国 汤德佑 肖伟吉   

  1. 华南理工大学 计算机科学与工程学院, 广东 广州 510006
  • 收稿日期:2008-04-07 修回日期:2008-04-23 出版日期:2009-01-25 发布日期:2009-01-25
  • 通信作者: 奚建清(1962-),男,教授,博士生导师,主要从事数据库、信息集成研究. E-mail:csjqxi@scut.edu.cn
  • 作者简介:奚建清(1962-),男,教授,博士生导师,主要从事数据库、信息集成研究.
  • 基金资助:

    广东省科技计划项目(2004A10205003,2006B11301001);广州市科技计划项目(200623-D3081)

A Parallel Closed-Cubing Algorithm Based on MapReduce

Xi Jian-qing  You Jin-guo  Tang De-you  Xiao Wei-ji   

  1. School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, Guangdong, China
  • Received:2008-04-07 Revised:2008-04-23 Online:2009-01-25 Published:2009-01-25
  • Contact: 奚建清(1962-),男,教授,博士生导师,主要从事数据库、信息集成研究. E-mail:csjqxi@scut.edu.cn
  • About author:奚建清(1962-),男,教授,博士生导师,主要从事数据库、信息集成研究.
  • Supported by:

    广东省科技计划项目(2004A10205003,2006B11301001);广州市科技计划项目(200623-D3081)

摘要: 封闭立方体是一种非常有效而重要的数据立方体压缩技术,目前还缺乏对其并行算法的研究.为此,文中提出一种采用C—Cubing方法并通过MapReduce并行模型进行并行化的新方法.该方法首先在Map过程中对各个数据分块计算出数据单元的代表元组和封闭掩码,然后在Reduce过程中进行聚合以获得封闭单元.实验结果表明,文中方法能有效地提高在大数据集上计算封闭立方体的速度.

关键词: 数据仓库, 联机分析处理, 并行算法, 封闭立方体, MapReduce技术

Abstract:

Although the closed cube is a high-efficiency and important technology for data cube compression, there is no research on its parallel algorithm at present. In this paper, a novel parallel approach combining the C-Cubing technology with the MapReduce framework is proposed. In this approach, the representative tuple and closed mask of each data cell for every data block are computed in the Map process, and the closed cells are obtained by the aggregation in the Reduce process. Experimental results show that the proposed approach greatly increases the computation speed of closed cubes in large-scale datasets.

Key words: data warehouse, online analytical processing, parallel algorithm, closed cube, MapReduce technology