计算机科学与技术

面向搜索引擎的分布式文件系统性能分析

展开
  • 华南理工大学 广东省计算机网络重点实验室,广东 广州 510640
董守斌(1967-) ,女,教授,博士生导师,主要从事高性能计算、信息检索、下一代互联网等的研究.

收稿日期: 2011-01-08

  网络出版日期: 2011-03-01

基金资助

国家自然科学基金资助项目( 61070092) ; 国家发改委CNGI 项目( CNGI2008-109 /122)

Performance Analysis of Distributed File System for Search Engine

Expand
  • Guangdong Key Laboratory of Computer Network,South China University of Technology,Guangzhou 510640,Guangdong,China
董守斌(1967-) ,女,教授,博士生导师,主要从事高性能计算、信息检索、下一代互联网等的研究.

Received date: 2011-01-08

  Online published: 2011-03-01

Supported by

国家自然科学基金资助项目( 61070092) ; 国家发改委CNGI 项目( CNGI2008-109 /122)

摘要

搜索引擎是一种数据密集型应用,其性能极大依赖于底层文件系统的性能.文中主要讨论分布式文件系统在搜索引擎应用环境下的性能评估和性能优化问题.首先概述了分布式文件系统的性能影响因素及相关研究进展; 在此基础上提出基于Hadoop 的开放架构,系统地评估HDFS 和Lustre 在搜索引擎应用场景下的性能; 最后针对实验评估发现的HDFS 在写性能及小文件数据处理方面的不足,提出改进方案,为搜索引擎的分布式文件系统优化提供参考.

本文引用格式

董守斌 赵铁柱 . 面向搜索引擎的分布式文件系统性能分析[J]. 华南理工大学学报(自然科学版), 2011 , 39(4) : 7 -14 . DOI: 10.3969/j.issn.1000-565X.2011.04.002

Abstract

As a search engine is a kind of data-intensive application,its performance is greatly affected by the underlying distributed file system. This paper deals with the performance evaluation and optimization of the distributed file system oriented to search engine application. In the investigation,first,the factors affecting the performance of distributed file system and the relevant research progress are summarized. Then,an open architecture based on Hadoop is designed to systematically evaluate the performance of HDFS and Lustre in search engine scenarios. Finally,several improved schemes are proposed to overcome the shortcomings of HDFS in terms of write performance and small file disposition clarified by the results of performance assessment,which provides a reference for the optimization of distributed file system.

文章导航

/