华南理工大学学报(自然科学版) ›› 2005, Vol. 33 ›› Issue (9): 25-29,34.

• • 上一篇    下一篇

基于支持向量机的汉语问句分类

余正涛1 樊孝忠2 郭剑毅1   

  1. 1.昆明理工大学 信息工程与自动化学院,云南 昆明 650051;2.北京理工大学 计算机科学与工程系,北京 100081
  • 收稿日期:2004-11-22 出版日期:2005-09-25 发布日期:2005-09-25
  • 通信作者: 余正涛(1970-),男,副教授,北京理工大学在职博士生,主要从事自然语言处理、汉语问答系统和信息提取方面的研究 E-mail:ztyu@bit.edu.cn
  • 作者简介:余正涛(1970-),男,副教授,北京理工大学在职博士生,主要从事自然语言处理、汉语问答系统和信息提取方面的研究
  • 基金资助:

    云南省信息技术基金资助项目(2002IT03)

Chinese Quedtion Classification Based on Support Vector Machine

Yu Zheng-tao1  Fan Xiao-zhong2  Guo Jian-yi1   

  1. 1.School of Information Engineering and Automation,Kunming Univ.of Sci.and Tech.,Kunming 650051,Yunnan,China;2.Dept.of Computer Science and Engineering,Beijing Institute of Tech.,Beijing 100081,China
  • Received:2004-11-22 Online:2005-09-25 Published:2005-09-25
  • Contact: 余正涛(1970-),男,副教授,北京理工大学在职博士生,主要从事自然语言处理、汉语问答系统和信息提取方面的研究 E-mail:ztyu@bit.edu.cn
  • About author:余正涛(1970-),男,副教授,北京理工大学在职博士生,主要从事自然语言处理、汉语问答系统和信息提取方面的研究
  • Supported by:

    云南省信息技术基金资助项目(2002IT03)

摘要: 目前汉语问句分类一般都依据疑问词及其相关词的组合规则,但由于规则的提取很深地依赖于语言知识,而且很难穷举出所有的特征规则,因此会影响分类的效果.支持向量机(SVM)是建立在统计理论基础上的机器学习方法,对于小样本分类问题有很好的识别效果.文中分析和定义了汉语问句的类型,建立了以SVM为基础的问句分类模型,详细描述了问句分类特征的选取过程,并在句法特征的基础上引入语义特征进行汉语问句分类实验,分类准确率达88.7%,表明结合句法和语义特征以SVM进行汉语问句分类具有很好的效果.

关键词: 问答系统, 问句分类, 支持向量机, 句法特征, 语义特征

Abstract:

At present,Chinese question classification is commonly based on the combinatorial rules between the in-terrogatives and their interrelated words.Because the extraction of the combinatorial rules gready depends on language knowledge and not all combinatorial rules can be listed.the classification performance is not desirable.As the SVM (Support Vector Machine),a machine learning method based on the statistical theory,possesses excellent discriminating effect on small sample classification,this paper establishes a question classification model based on SVM after the analysis and definition of Chinese question types.Th e process of the feature selection for question classification is then described in detail.Finally,a question classification experiment is carried out by introducing coresponding semantic features based on syntactic achieved,which a classification accuracy of 88.7% being achieved,which indicates that Chinese questions can be excellently clsssified by means of SVM with the combina-tion of syntactic features and semantic features.

Key words: question-answering system, ;question classification, support vector machine, syntactic feature, seman-tic feature