Journal of South China University of Technology (Natural Science Edition) ›› 2007, Vol. 35 ›› Issue (9): 90-94,106.
• Computer Science & Technology • Previous Articles Next Articles
Yu Jiang-de Fan xiao-zhong yin ji-hao
Received:
Online:
Published:
Contact:
About author:
Supported by:
教育部博士点基金资助项目(20050007023)
Abstract:
The information of headers and citations of research papers is necessaηfor many applications , such asthe field-based paper search , the paper statistics and the citation analysis. In order to enhance the utilization ofcontext features for information extraction which is greatly restricted by the hidden Markov model (HMM) , a methodbased on the conditional random fields (CRFs) is proposed to extract the information of paper header and citationfrom Chinese research papers. The proposed method , whose key is the parameter estimation and the feature selection, employs L-BFGS algorithm for the estimation of model parameters in the experiment and selects the categoriesfeatures of location , layout , lexicon and state transition as the feature set of the model. During the information extraction, the format information about list separators and special-labels is used to segment the text , and then CRFsare applied to the extraction in special fields. Experimental results show that the proposed method possesses betterperformance than that based on the HMM , and that the performance improvement varies with the features sets.
Key words: infoIτnation extraction, conditional random field, citation information, paper header information
Yu Jiang-de Fan xiao-zhong yin ji-hao. Information Extraction from Chinese Research Papers Based on Conditional Random Fields[J]. Journal of South China University of Technology (Natural Science Edition), 2007, 35(9): 90-94,106.
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: https://zrb.bjb.scut.edu.cn/EN/
https://zrb.bjb.scut.edu.cn/EN/Y2007/V35/I9/90