Journal of South China University of Technology (Natural Science Edition) ›› 2017, Vol. 45 ›› Issue (1): 102-111.doi: 10.3969/j.issn.1000-565X.2017.01.015

• Computer Science & Technology • Previous Articles     Next Articles

MapReduce Job Performance Tuning by Optimizing Memory Configurations

LUO Yong-gang CHEN Xing-shu YANG Lu   

  1. Cybersecurity Research Institute,Sichuan University,Chengdu 610065,Sichuan,China
  • Received:2015-11-25 Revised:2016-09-13 Online:2017-01-25 Published:2016-12-01
  • Contact: 罗永刚( 1980-) ,男,博士生,主要从事大数据和网络安全研究. E-mail:iamlyg98@gmail.com
  • About author:罗永刚( 1980-) ,男,博士生,主要从事大数据和网络安全研究.
  • Supported by:
    Supported by the National Science and Technology Support Planning Program of China( 2012BAH18B05) and the National Natural Science Foundation of China( 61272447)

Abstract:

MapReduce job performance depends heavily on memory configurations.In order to overcome the difficulty in predicting the memory requirement of MapReduce jobs,on the basis of the fact that Java Virtual Machine ( JVM) divides the heap space managed by JVM Garbage Collector into young and old generations,a generational memory prediction method is proposed.In the method,first,a regression model to resolve average garbage collection time for a given young generation size is constructed.Then,the problem of looking for the rational size of young generation is converted into a constrained nonlinear optimization problem,and a fixed-size search algorithm is designed to solve the optimization problem.Moreover,memory models of the Map and Reduce tasks of MapReduce jobs are constructed to solve the memory requirement of optimal performance,thus obtaining reasonable old generation memory size of the Map and Reduce tasks.Finally,a k-means clustering algorithm is used to predict the value of parameter PretenureSizeThreshold,and JVM configurations are tuned to reduce garbage collection pause time.Experimental results show that the proposed method can accurately predict the memory requirements of the Map and Reduce tasks of MapReduce jobs,and it can significantly improve job performance.

Key words: big data, MapReduce, garbage collection, memory allocation, performance tuning