Journal of South China University of Technology (Natural Science Edition) ›› 2008, Vol. 36 ›› Issue (5): 43-47,52.

• Computer Science & Technology • Previous Articles     Next Articles

Data Cross-Locating in Web Information Extraction

Chen Tian  Huang Min   

  1. School of Software Engineering,South China University of Technology,Guangzhou 510006,Guangdong,China
  • Received:2007-06-04 Revised:2007-09-30 Online:2008-05-25 Published:2008-05-25
  • Contact: 陈天(1978-),男,博士,讲师,主要从事Web信息抽取、中文信息处理、教育信息化方面的研究. E-mail:chentian@scut.edu.cn
  • About author:陈天(1978-),男,博士,讲师,主要从事Web信息抽取、中文信息处理、教育信息化方面的研究.
  • Supported by:

    广东省科技计划项目(2006B11301001);广东省国际科技合作计划项目(2007A050100026);广东省工业科技攻关计划项目(2006B80407001)

Abstract:

In general,when the changes of webpage exceed the tolerance of the wrapper script,the script has to be modified to re-locate the data.In order to solve this problem,this paper presents a new cross-locating method of data,where multi-coordinate are set up to locate the needed data.When one coordinate fails to work,others can repair it automatically and extrat data correctly.Experimental results show that the Web wrapper based on the cross-locating method can greatly improve the tolerance of wrapper script to HTML webpage without decreasing the information-extracting performance.

Key words: Web information extraction, information retrieval, wrapper, cross-locating