Journal of South China University of Technology (Natural Science Edition) ›› 2011, Vol. 39 ›› Issue (4): 7-14.doi: 10.3969/j.issn.1000-565X.2011.04.002

• Computer Science & Technology • Previous Articles     Next Articles

Performance Analysis of Distributed File System for Search Engine

Dong Shou-bin  Zhao Tie-zhu   

  1. Guangdong Key Laboratory of Computer Network,South China University of Technology,Guangzhou 510640,Guangdong,China
  • Received:2011-01-08 Online:2011-04-25 Published:2011-03-01
  • Contact: 董守斌(1967-) ,女,教授,博士生导师,主要从事高性能计算、信息检索、下一代互联网等的研究. E-mail:sbdong@scut.edu.cn
  • About author:董守斌(1967-) ,女,教授,博士生导师,主要从事高性能计算、信息检索、下一代互联网等的研究.
  • Supported by:

    国家自然科学基金资助项目( 61070092) ; 国家发改委CNGI 项目( CNGI2008-109 /122)

Abstract:

As a search engine is a kind of data-intensive application,its performance is greatly affected by the underlying distributed file system. This paper deals with the performance evaluation and optimization of the distributed file system oriented to search engine application. In the investigation,first,the factors affecting the performance of distributed file system and the relevant research progress are summarized. Then,an open architecture based on Hadoop is designed to systematically evaluate the performance of HDFS and Lustre in search engine scenarios. Finally,several improved schemes are proposed to overcome the shortcomings of HDFS in terms of write performance and small file disposition clarified by the results of performance assessment,which provides a reference for the optimization of distributed file system.

Key words: search engine, distributed file system, HDFS file system, Lustre file system, performance analysis, performance optimization