Journal of South China University of Technology (Natural Science Edition) ›› 2014, Vol. 42 ›› Issue (5): 135-142.doi: 10.3969/j.issn.1000-565X.2014.05.021

• Computer Science & Technology • Previous Articles     Next Articles

SingleMapReduce: a MapReduce Programming Model Outputting Single HDFS File

Chen Ji- rong Le Jia- jin   

  1. School of Computer Science and Technology,Donghua University,Shanghai 201620,China
  • Received:2013-11-19 Revised:2014-03-23 Online:2014-05-25 Published:2014-04-01
  • Contact: 陈吉荣(1971-),男,讲师,博士后,主要从事 Hadoop 生态系统大数据平台研究. E-mail:chenjirongdh@163.com
  • About author:陈吉荣(1971-),男,讲师,博士后,主要从事 Hadoop 生态系统大数据平台研究.
  • Supported by:

    国家核高基专项(2010ZX01042-001-003)

Abstract:

In order to obtain single HDFS (Hadoop Distributed File System) file that cannot be provided by classi-cal MapReduce programming model,a new MapReduce programming model named SingleMapReduce is presented.In this mode,all files in an output directory are consolidated into a single HDFS file by intercepting Job Successfulstate.Then,four features of HDFS are summarized,and two concepts including Typical Distribution of Block andAtypical Distribution of Block are proposed,on the basis of which metadata are integrated to obtain integrated files.The results of theoretical analysis and experiments show that (1) one MapReduce computing on the basis of Sin-gleMapReduce helps achieve single output file; (2) the output produced by one MapReduce computing can be splitvia file splitting; (3) one large- scale table or one large- scale file can be imported into HDFS in a parallel manner;and (4) SingleMapReduce supports the scalability of name node in auxiliary.

Key words: distributed computing system, metadata, MapReduce, Hadoop distributed file system, name node;data node, block