华南理工大学学报(自然科学版) ›› 2019, Vol. 47 ›› Issue (6): 10-17,30.doi: 10.12141/j.issn.1000-565X.180486

• 计算机科学与技术 • 上一篇    下一篇

一种结合词性及注意力的句子情感分类方法

苏锦钿1 余珊珊2 李鹏飞1   

  1. 1. 华南理工大学 计算机科学与工程学院,广东 广州 510640; 2. 广东药科大学 医药信息工程学院,广东 广州 510006
  • 收稿日期:2018-09-27 修回日期:2019-02-09 出版日期:2019-06-25 发布日期:2019-05-05
  • 通信作者: 余珊珊(1980-),女,博士,讲师,主要从事人工智能、形式语义等研究. E-mail:susyu@139.com
  • 作者简介:苏锦钿(1980-),男,博士,副教授,主要从事自然语言处理、深度学习和程序语言设计研究. E-mail:SuJD@ scut. edu. cn
  • 基金资助:
    广东省科技厅应用型科技研发专项资金项目(20168010124010); 广东省自然科学基金资助项目(2015A030310318);广东省医学科学技术研究基金项目(A2015065)

A Sentence Sentiment Classification Method with POS and Attention

SU Jindian1 YU Shanshan2 LI Pengfei1   

  1. 1. College of Computer Science and Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China; 2. College of Medical Information Engineering,Guangdong Pharmaceutical University,Guangzhou 510006,Guangdong,China 
  • Received:2018-09-27 Revised:2019-02-09 Online:2019-06-25 Published:2019-05-05
  • Contact: 余珊珊(1980-),女,博士,讲师,主要从事人工智能、形式语义等研究. E-mail:susyu@139.com
  • About author:苏锦钿(1980-),男,博士,副教授,主要从事自然语言处理、深度学习和程序语言设计研究. E-mail:SuJD@ scut. edu. cn
  • Supported by:
    Supported by the Applied Scientific and Technological Special Project of Department of Science and Technology of Guangdong Province(20168010124010),Natural Science Foundation of Guangdong Province(2015A030310318) and the Medical Scientific Research Foundation of Guangdong Province(A2015065) 

摘要: 针对目前各种基于长短期记忆网络 LSTM 的句子情感分类方法没有考虑词的词 性信息这一问题,将词性与自注意力机制相结合,提出一种面向句子情感分类的神经网络 模型 PALSTM(Pos and Attention-based LSTM). 首先,结合预训练词向量和词性标注工具 分别给出句子中词的语义词向量和词性词向量表示,并作为 LSTM 的输入用于学习词在 内容和词性方面的长期依赖关系,有效地弥补了一般 LSTM 单纯依赖预训练词向量中词 的共现信息的不足;接着,利用自注意力机制学习句子中词的位置信息和权重向量,并构 造句子的最终语义表示;最后由多层感知器进行分类和输出. 实验结果表明,PALSTM 在 公开语料库 Movie Reviews、Internet Movie Database 和 Stanford Sentiment Treebank 二元分 类及五元情感上的准确率均比一般的 LSTM 和注意力 LSTM 模型有一定的提升.

关键词: 自然语言处理, 情感分类, 神经网络, 词性, 自注意力

Abstract: Aiming at the problem that most of existing LSTM-based methods for sentence sentiment classification don’t take into account the part-of-speech (POS) information of words,a neural network model,PALSTM, which combines POS and self-attention mechanism,was proposed and applied to sentence sentiment classification. Firstly,PALSTM used pre-trained word vectors and POS tagging tool to give the semantic and POS word vector rep- resentations of words in the sentences,and then took them as the inputs of a LSTM so as to capture the long-term dependence of words on content and part-of-speech,which effectively compensates for the common LSTM networks relying solely on the co-occurrence information of words in pre-trained word vectors. Secondly,the self-attention mechanism was used to learn the position information about words in the sentences and build the corresponding po- sition weight matrix,which yields the final semantic representations of sentences. Finally,the results was classi- fied and outputted via a multi-layer perceptron. The experiments show that PALSTM outperforms common LSTM and attentional LSTM models on some open corpus,i. e. Movie Reviews,Internet Movie Database,Stanford Sen- timent Treebank binary and fine-grained classification.

Key words: natural language processing, sentiment classification, neural network, part of speech, self attention

中图分类号: