收稿日期: 2011-01-09
网络出版日期: 2011-03-01
基金资助
国家自然科学基金资助项目( 60933004) ; 广东省计算机网络重点实验室资助项目( CCNL200601) ; “核心电子器件、高端通用芯片及基础软件产品”国家科技重大专项项目( 2011ZX01042-001-001)
Static Index Pruning Based on Document Importance
Received date: 2011-01-09
Online published: 2011-03-01
Supported by
国家自然科学基金资助项目( 60933004) ; 广东省计算机网络重点实验室资助项目( CCNL200601) ; “核心电子器件、高端通用芯片及基础软件产品”国家科技重大专项项目( 2011ZX01042-001-001)
李晓明 单栋栋 . 基于文档重要度的静态索引剪枝方法[J]. 华南理工大学学报(自然科学版), 2011 , 39(4) : 1 -6 . DOI: 10.3969/j.issn.1000-565X.2011.04.001
As the quality and importance of Web pages are both variable,paper proposes a static index pruning method which uses the web page importance to determine the ratio of information kept for each document. The result of experiments on GOV2 dataset show that ( 1) the proposed method greatly reduces the storage size and speeds up the search; ( 2) when the pruned index takes only 13% of the original size,P@ 10 and P@ 20 reach or exceed the baseline using full index; and ( 3) by using the proposed method,P@ 10,P@ 20 and MAP are all better than those of the traditional method at the same pruning level.
Key words: search engine; inverted index; static index pruning; document importance
/
| 〈 |
|
〉 |