Computer Science & Technology

Performance Analysis of Distributed File System for Search Engine

Expand
  • Guangdong Key Laboratory of Computer Network,South China University of Technology,Guangzhou 510640,Guangdong,China
董守斌(1967-) ,女,教授,博士生导师,主要从事高性能计算、信息检索、下一代互联网等的研究.

Received date: 2011-01-08

  Online published: 2011-03-01

Supported by

国家自然科学基金资助项目( 61070092) ; 国家发改委CNGI 项目( CNGI2008-109 /122)

Abstract

As a search engine is a kind of data-intensive application,its performance is greatly affected by the underlying distributed file system. This paper deals with the performance evaluation and optimization of the distributed file system oriented to search engine application. In the investigation,first,the factors affecting the performance of distributed file system and the relevant research progress are summarized. Then,an open architecture based on Hadoop is designed to systematically evaluate the performance of HDFS and Lustre in search engine scenarios. Finally,several improved schemes are proposed to overcome the shortcomings of HDFS in terms of write performance and small file disposition clarified by the results of performance assessment,which provides a reference for the optimization of distributed file system.

Cite this article

Dong Shou-bin Zhao Tie-zhu . Performance Analysis of Distributed File System for Search Engine[J]. Journal of South China University of Technology(Natural Science), 2011 , 39(4) : 7 -14 . DOI: 10.3969/j.issn.1000-565X.2011.04.002

Outlines

/