Computer Science & Technology

SingleMapReduce: a MapReduce Programming Model Outputting Single HDFS File

Expand
  • School of Computer Science and Technology,Donghua University,Shanghai 201620,China
陈吉荣(1971-),男,讲师,博士后,主要从事 Hadoop 生态系统大数据平台研究.

Received date: 2013-11-19

  Revised date: 2014-03-23

  Online published: 2014-04-01

Supported by

国家核高基专项(2010ZX01042-001-003)

Abstract

In order to obtain single HDFS (Hadoop Distributed File System) file that cannot be provided by classi-cal MapReduce programming model,a new MapReduce programming model named SingleMapReduce is presented.In this mode,all files in an output directory are consolidated into a single HDFS file by intercepting Job Successfulstate.Then,four features of HDFS are summarized,and two concepts including Typical Distribution of Block andAtypical Distribution of Block are proposed,on the basis of which metadata are integrated to obtain integrated files.The results of theoretical analysis and experiments show that (1) one MapReduce computing on the basis of Sin-gleMapReduce helps achieve single output file; (2) the output produced by one MapReduce computing can be splitvia file splitting; (3) one large- scale table or one large- scale file can be imported into HDFS in a parallel manner;and (4) SingleMapReduce supports the scalability of name node in auxiliary.

Cite this article

Chen Ji- rong Le Jia- jin . SingleMapReduce: a MapReduce Programming Model Outputting Single HDFS File[J]. Journal of South China University of Technology(Natural Science), 2014 , 42(5) : 135 -142 . DOI: 10.3969/j.issn.1000-565X.2014.05.021

Outlines

/