计算机科学与技术

基于分隔符和上下文术语的领域现象术语抽取

展开
  • 北京理工大学 计算机学院,北京 100081
刘里(1983-) ,男,博士生,主要从事自然语言处理研究.

收稿日期: 2010-10-29

  修回日期: 2011-03-08

  网络出版日期: 2011-06-03

基金资助

国家自然科学基金资助项目( 61003065)

Extraction of Domain-Specific Phenomenal Terms Based on Separator and Contextual Terms

Expand
  • School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China
刘里(1983-) ,男,博士生,主要从事自然语言处理研究.

Received date: 2010-10-29

  Revised date: 2011-03-08

  Online published: 2011-06-03

Supported by

国家自然科学基金资助项目( 61003065)

摘要

领域现象术语常常是复合型短语,很难根据局部上下文特征用传统的机器学习方法来抽取.为此,文中提出了一种领域现象术语的抽取方法.该方法首先用基于上下文的方法抽取得到分隔符集,然后结合分隔符集和上下文术语用改进的NC-value 算法进行候选领域现象术语抽取,最后在候选领域现象术语中过滤掉名词性术语,进而得到最终结果.实验表明,文中方法对领域现象术语的抽取效果优于基于词频的方法和基于分隔符的方法.

本文引用格式

刘里 刘小明 . 基于分隔符和上下文术语的领域现象术语抽取[J]. 华南理工大学学报(自然科学版), 2011 , 39(7) : 146 -149,155 . DOI: 10.3969/j.issn.1000-565X.2011.07.024

Abstract

As domain-specific phenomenal terms are usually compounds that are difficult to extract according to local context features via the traditional machine learning methods,a novel extraction method is proposed. In this method,first,the context-based method is employed to extract the separator set. Then,with the combination of the separator set and context terms,the improved NC-value algorithm is used to extract candidate phenomenal results. Finally,nominal terms are filtered out from the candidate phenomenal terms to obtain the final terms. Experimental results indicate that the proposed extraction method of domain-specific phenomenal terms performs better than the word frequency-based and the separator-based ones.

文章导航

/