收稿日期: 2013-11-19
修回日期: 2014-03-23
网络出版日期: 2014-04-01
基金资助
国家核高基专项(2010ZX01042-001-003)
SingleMapReduce: a MapReduce Programming Model Outputting Single HDFS File
Received date: 2013-11-19
Revised date: 2014-03-23
Online published: 2014-04-01
Supported by
国家核高基专项(2010ZX01042-001-003)
陈吉荣 乐嘉锦 . SingleMapReduce:单一输出 HDFS 文件的 MapReduce编程模型[J]. 华南理工大学学报(自然科学版), 2014 , 42(5) : 135 -142 . DOI: 10.3969/j.issn.1000-565X.2014.05.021
In order to obtain single HDFS (Hadoop Distributed File System) file that cannot be provided by classi-cal MapReduce programming model,a new MapReduce programming model named SingleMapReduce is presented.In this mode,all files in an output directory are consolidated into a single HDFS file by intercepting Job Successfulstate.Then,four features of HDFS are summarized,and two concepts including Typical Distribution of Block andAtypical Distribution of Block are proposed,on the basis of which metadata are integrated to obtain integrated files.The results of theoretical analysis and experiments show that (1) one MapReduce computing on the basis of Sin-gleMapReduce helps achieve single output file; (2) the output produced by one MapReduce computing can be splitvia file splitting; (3) one large- scale table or one large- scale file can be imported into HDFS in a parallel manner;and (4) SingleMapReduce supports the scalability of name node in auxiliary.
/
| 〈 |
|
〉 |