Journal of South China University of Technology (Natural Science Edition) ›› 2007, Vol. 35 ›› Issue (9): 90-94,106.

• Computer Science & Technology • Previous Articles     Next Articles

Information Extraction from Chinese Research Papers Based on Conditional Random Fields

Yu Jiang-de Fan xiao-zhong  yin ji-hao   

  1. School of Computer Science and Tech. , Beijing Institute of Tech. , Beijing 100081 , China
  • Received:2006-11-27 Online:2007-09-25 Published:2007-09-25
  • Contact: 于江德(1971-) ,男,博士生,主要从事自然语言处理、信息抽取和信息检索方面的研究. E-mail:jangder@bit. edu. cn
  • About author:于江德(1971-) ,男,博士生,主要从事自然语言处理、信息抽取和信息检索方面的研究.
  • Supported by:

    教育部博士点基金资助项目(20050007023)

Abstract:

The information of headers and citations of research papers is necessaηfor many applications , such asthe field-based paper search , the paper statistics and the citation analysis. In order to enhance the utilization ofcontext features for information extraction which is greatly restricted by the hidden Markov model (HMM) , a methodbased on the conditional random fields (CRFs) is proposed to extract the information of paper header and citationfrom Chinese research papers. The proposed method , whose key is the parameter estimation and the feature selection, employs L-BFGS algorithm for the estimation of model parameters in the experiment and selects the categoriesfeatures of location , layout , lexicon and state transition as the feature set of the model. During the information extraction, the format information about list separators and special-labels is used to segment the text , and then CRFsare applied to the extraction in special fields. Experimental results show that the proposed method possesses betterperformance than that based on the HMM , and that the performance improvement varies with the features sets.

Key words: infoIτnation extraction, conditional random field, citation information, paper header information