Journal of South China University of Technology (Natural Science Edition) ›› 2009, Vol. 37 ›› Issue (1): 91-95,112.

• Computer Science & Technology • Previous Articles     Next Articles

A Parallel Closed-Cubing Algorithm Based on MapReduce

Xi Jian-qing  You Jin-guo  Tang De-you  Xiao Wei-ji   

  1. School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, Guangdong, China
  • Received:2008-04-07 Revised:2008-04-23 Online:2009-01-25 Published:2009-01-25
  • Contact: 奚建清(1962-),男,教授,博士生导师,主要从事数据库、信息集成研究. E-mail:csjqxi@scut.edu.cn
  • About author:奚建清(1962-),男,教授,博士生导师,主要从事数据库、信息集成研究.
  • Supported by:

    广东省科技计划项目(2004A10205003,2006B11301001);广州市科技计划项目(200623-D3081)

Abstract:

Although the closed cube is a high-efficiency and important technology for data cube compression, there is no research on its parallel algorithm at present. In this paper, a novel parallel approach combining the C-Cubing technology with the MapReduce framework is proposed. In this approach, the representative tuple and closed mask of each data cell for every data block are computed in the Map process, and the closed cells are obtained by the aggregation in the Reduce process. Experimental results show that the proposed approach greatly increases the computation speed of closed cubes in large-scale datasets.

Key words: data warehouse, online analytical processing, parallel algorithm, closed cube, MapReduce technology