Journal of South China University of Technology(Natural Science Edition) ›› 2012, Vol. 40 ›› Issue (1): 152-158.

• Computer Science & Technology • Previous Articles     Next Articles

An Improved Data Placement Strategy for Hadoop

Lin Wei-wei   

  1. School of Computer Engineering and Science,South China University of Technology,Guangzhou 510006,Guangdong,China
  • Received:2011-06-23 Revised:2011-10-11 Online:2012-01-25 Published:2011-12-01
  • Contact: 林伟伟(1980-) ,男,博士,讲师,主要从事分布式系统研究. E-mail:linww@scut.edu.cn
  • About author:林伟伟(1980-) ,男,博士,讲师,主要从事分布式系统研究.
  • Supported by:

    国家自然科学基金资助项目( 61070015) ; 广东省自然科学基金资助项目( 10451064101005155,S2011010001754,9451063101002213) ; 广东省科技计划项目( 2010B010600032)

Abstract:

In the existing default data placement strategy for Hadoop,much time is needed to restore data from a remote DataNode when the local replicas become unavailable,and the load balancing may be destroyed due to the random selection of DataNode for data storage. In order to solve these problems,an improved data placement strategy is proposed,which chooses the most appropriate DataNode to place remote replicas according to the scheduling evaluation value of each DataNode based on DataNodes' network distance and data load. Thus,the load balancing for data storage is implemented and excellent data transmission is achieved. The proposed data placement strategy is then implemented in the Hadoop platform and the results show that the proposed strategy is superior to the existing default data placement strategy because it improves the local balancing for data storage and reduces the time for data placement.

Key words: Hadoop, data placement, load balancing, strategy

CLC Number: