华南理工大学学报(自然科学版) ›› 2011, Vol. 39 ›› Issue (7): 146-149,155.doi: 10.3969/j.issn.1000-565X.2011.07.024

• 计算机科学与技术 • 上一篇    下一篇

基于分隔符和上下文术语的领域现象术语抽取

刘里 刘小明   

  1. 北京理工大学 计算机学院,北京 100081
  • 收稿日期:2010-10-29 修回日期:2011-03-08 出版日期:2011-07-25 发布日期:2011-06-03
  • 通信作者: 刘里(1983-) ,男,博士生,主要从事自然语言处理研究. E-mail:niceliuli@sina.com
  • 作者简介:刘里(1983-) ,男,博士生,主要从事自然语言处理研究.
  • 基金资助:

    国家自然科学基金资助项目( 61003065)

Extraction of Domain-Specific Phenomenal Terms Based on Separator and Contextual Terms

Liu Li  Liu Xiao-ming   

  1. School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China
  • Received:2010-10-29 Revised:2011-03-08 Online:2011-07-25 Published:2011-06-03
  • Contact: 刘里(1983-) ,男,博士生,主要从事自然语言处理研究. E-mail:niceliuli@sina.com
  • About author:刘里(1983-) ,男,博士生,主要从事自然语言处理研究.
  • Supported by:

    国家自然科学基金资助项目( 61003065)

摘要: 领域现象术语常常是复合型短语,很难根据局部上下文特征用传统的机器学习方法来抽取.为此,文中提出了一种领域现象术语的抽取方法.该方法首先用基于上下文的方法抽取得到分隔符集,然后结合分隔符集和上下文术语用改进的NC-value 算法进行候选领域现象术语抽取,最后在候选领域现象术语中过滤掉名词性术语,进而得到最终结果.实验表明,文中方法对领域现象术语的抽取效果优于基于词频的方法和基于分隔符的方法.

关键词: 术语抽取, 分隔符, 复合词, NC-value 算法

Abstract:

As domain-specific phenomenal terms are usually compounds that are difficult to extract according to local context features via the traditional machine learning methods,a novel extraction method is proposed. In this method,first,the context-based method is employed to extract the separator set. Then,with the combination of the separator set and context terms,the improved NC-value algorithm is used to extract candidate phenomenal results. Finally,nominal terms are filtered out from the candidate phenomenal terms to obtain the final terms. Experimental results indicate that the proposed extraction method of domain-specific phenomenal terms performs better than the word frequency-based and the separator-based ones.

Key words: term extraction, separator, compound, NC-value algorithm