基于内存优化配置的MapReduce 性能调优

doi:10.3969/j.issn.1000-565X.2017.01.015

华南理工大学学报（自然科学版） ›› 2017, Vol. 45 ›› Issue (1): 102-111.doi: 10.3969/j.issn.1000-565X.2017.01.015

基于内存优化配置的MapReduce 性能调优

罗永刚陈兴蜀杨露

四川大学网络空间安全研究院，四川成都 610065

收稿日期:2015-11-25 修回日期:2016-09-13 出版日期:2017-01-25 发布日期:2016-12-01
通信作者: 罗永刚( 1980-) ，男，博士生，主要从事大数据和网络安全研究． E-mail:iamlyg98@gmail.com
作者简介:罗永刚( 1980-) ，男，博士生，主要从事大数据和网络安全研究．
基金资助:
国家科技支撑计划项目( 2012BAH18B05) ; 国家自然科学基金资助项目( 61272447)

MapReduce Job Performance Tuning by Optimizing Memory Configurations

LUO Yong-gang CHEN Xing-shu YANG Lu

Cybersecurity Research Institute,Sichuan University,Chengdu 610065,Sichuan,China

Received:2015-11-25 Revised:2016-09-13 Online:2017-01-25 Published:2016-12-01
Contact: 罗永刚( 1980-) ，男，博士生，主要从事大数据和网络安全研究． E-mail:iamlyg98@gmail.com
About author:罗永刚( 1980-) ，男，博士生，主要从事大数据和网络安全研究．
Supported by:
Supported by the National Science and Technology Support Planning Program of China( 2012BAH18B05) and the National Natural Science Foundation of China( 61272447)

摘要/Abstract

摘要： MapReduce 作业性能与内存配置存在极大的相关性，针对准确预测作业内存困难问题，根据Java 虚拟机( JVM) 的分代内存管理特点，提出了一种分代内存预测方法．首先使用回归模型对年轻代与垃圾回收平均时间的关系进行建模，将寻找合理年轻代内存大小的问题转换为一个受约束的非线性优化问题，并设计搜索算法来求解该优化问题．文中还建立MapReduce 作业的Map 任务和Reduce 任务性能与内存的关系模型，求解最佳性能的内存需求，从而获得Map 任务和Reduce 任务的年长代内存大小; 使用聚类算法预测JVM 晋升对象阈值，优化JVM 配置，减少了JVM 的垃圾回收暂停时间．实验结果表明，文中提出的方法能准确预测作业的内存需求，显著提升作业运行性能．

关键词: 大数据, MapReduce, 垃圾回收, 内存分配, 性能优化

Abstract:

MapReduce job performance depends heavily on memory configurations.In order to overcome the difficulty in predicting the memory requirement of MapReduce jobs,on the basis of the fact that Java Virtual Machine ( JVM) divides the heap space managed by JVM Garbage Collector into young and old generations,a generational memory prediction method is proposed.In the method,first,a regression model to resolve average garbage collection time for a given young generation size is constructed.Then,the problem of looking for the rational size of young generation is converted into a constrained nonlinear optimization problem,and a fixed-size search algorithm is designed to solve the optimization problem.Moreover,memory models of the Map and Reduce tasks of MapReduce jobs are constructed to solve the memory requirement of optimal performance,thus obtaining reasonable old generation memory size of the Map and Reduce tasks.Finally,a k-means clustering algorithm is used to predict the value of parameter PretenureSizeThreshold,and JVM configurations are tuned to reduce garbage collection pause time.Experimental results show that the proposed method can accurately predict the memory requirements of the Map and Reduce tasks of MapReduce jobs,and it can significantly improve job performance.

Key words: big data, MapReduce, garbage collection, memory allocation, performance tuning

罗永刚陈兴蜀杨露. 基于内存优化配置的MapReduce 性能调优[J]. 华南理工大学学报（自然科学版）, 2017, 45(1): 102-111.

LUO Yong-gang CHEN Xing-shu YANG Lu. MapReduce Job Performance Tuning by Optimizing Memory Configurations[J]. Journal of South China University of Technology (Natural Science Edition), 2017, 45(1): 102-111.

[1]	林旭坤, 张扬, 罗芷晴, 等. 高速公路网车辆碳排放测算方法研究[J]. 华南理工大学学报(自然科学版), 2022, 50(9): 22-28.
[2]	姚树申翁小雄李飞羽. 基于时间特征行为动力学的通勤模式分析[J]. 华南理工大学学报（自然科学版）, 2019, 47(9): 53-60.
[3]	郑美光杨姣常成龙胡志刚. 非结构化云数据管理系统不稳定数据分区识别算法[J]. 华南理工大学学报（自然科学版）, 2019, 47(8): 105-112.
[4]	洪晓斌子文江余蓉罗宗强何振威. 大型钢结构无损云检测的可信度融合评估[J]. 华南理工大学学报（自然科学版）, 2019, 47(3): 70-77.
[5]	马智亮滕明焜任远. 面向大数据分析的建筑能耗信息模型[J]. 华南理工大学学报（自然科学版）, 2019, 47(12): 72-77,91.
[6]	马如进徐世桥王达磊陈艾荣. 基于大数据的大跨悬索桥钢箱梁疲劳寿命分析[J]. 华南理工大学学报（自然科学版）, 2017, 45(6): 66-73.
[7]	何炎祥刘健博孙松涛. 基于神经网络的微博舆情预测方法[J]. 华南理工大学学报（自然科学版）, 2016, 44(9): 47-52.
[8]	徐建闽王钰林培群. 大数据环境下的动态最短路径算法[J]. 华南理工大学学报（自然科学版）, 2015, 43(10): 1-7.
[9]	郑晓峰徐建闽卢凯. 基于属性维划分和MapReduce 的道路运输信息系统数据聚类[J]. 华南理工大学学报（自然科学版）, 2014, 42(8): 122-128,135.
[10]	陈吉荣乐嘉锦. SingleMapReduce:单一输出 HDFS 文件的 MapReduce编程模型[J]. 华南理工大学学报（自然科学版）, 2014, 42(5): 135-142.
[11]	陈兴蜀张帅童浩崔晓靖. 基于布尔矩阵和 MapReduce 的 FP-Growth 算法[J]. 华南理工大学学报（自然科学版）, 2014, 42(1): 135-141.
[12]	魏德敏陈贵涛. 自适应子群体米母算法及其在混凝土框架位移性能优化中的应用[J]. 华南理工大学学报（自然科学版）, 2013, 41(10): 108-116.
[13]	李志杰方旭明. MIMO-WLAN 的帧聚合优化方案[J]. 华南理工大学学报（自然科学版）, 2011, 39(7): 56-62,69.
[14]	董守斌赵铁柱. 面向搜索引擎的分布式文件系统性能分析[J]. 华南理工大学学报（自然科学版）, 2011, 39(4): 7-14.
[15]	奚建清游进国汤德佑肖伟吉. 基于MapReduce的封闭立方体并行计算方法[J]. 华南理工大学学报（自然科学版）, 2009, 37(1): 91-95,112.

基于内存优化配置的MapReduce 性能调优

MapReduce Job Performance Tuning by Optimizing Memory Configurations

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价