华南理工大学学报(自然科学版) ›› 2021, Vol. 49 ›› Issue (1): 10-17.doi: 10.12141/j.issn.1000-565X.200506

所属专题: 2021年计算机科学与技术

• 计算机科学与技术 • 上一篇    下一篇

结合类别关键词与注意力机制的药物相互关系抽取模型

IKA Novita Dewi 蔡晓玲 刘晓锋 董守斌
  

  1. 华南理工大学 计算机科学与工程学院,广东 广州 510006
  • 收稿日期:2020-08-24 修回日期:2020-12-15 出版日期:2021-01-25 发布日期:2021-01-01
  • 通信作者: 董守斌 ( 1967-) ,女,教授,主要从事信息检索、自然语言处理、高性能计算研究。 E-mail:sbdong@scut.edu.cn
  • 作者简介:IKA Novita Dewi ( 1987-) ,女,博士生,主要从事自然语言处理研究。E-mail: sbdong@scut.edu.cn
  • 基金资助:
    国家自然科学基金资助项目 ( 61976239) ; 中山市引进高端科研机构创新专项资金资助项目 ( 2019AG031)

Drug-Drug Interaction Extraction Model Combining Category Keywords with Attention Mechanism 

IKA Novita Dewi CAI Xiaoling LIU Xiaofeng DONG Shoubin   

  1. School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006,Guangdong,China
  • Received:2020-08-24 Revised:2020-12-15 Online:2021-01-25 Published:2021-01-01
  • Contact: 董守斌 ( 1967-) ,女,教授,主要从事信息检索、自然语言处理、高性能计算研究。 E-mail:sbdong@scut.edu.cn
  • About author:IKA Novita Dewi ( 1987-) ,女,博士生,主要从事自然语言处理研究。E-mail: sbdong@scut.edu.cn
  • Supported by:
    Supported by the National Natural Science Foundation of China ( 61976239)

摘要: 为了增加对不同类别样例的区分度,提高模型的分类效果,提出了结合类别关 键词和注意力机制的药物相互关系 ( DDI) 抽取模型 KA-BERT。首先基于卡方检验和 文档频率获取每个类别的关键词,然后在预训练 BERT 模型中加入关键词与药物对的位 置编码,以增加样例的差异性,并通过注意力机制学习关键词与句子中其他词的分布信 息。针对药物关系抽取任务中负样例较多的问题,文中提出了基于规则和模式的负样例 过滤方法,以有效降低正负样本比例。与现有基于 CNN、基于 LSTM 和基于 BERT 的 DDI 提取模型实验结果的对比表明,KA-BERT 模型能够很好地提高药物关系的提取效 果,证明了该模型的有效性。在化学 - 蛋白质相互关系抽取上的测试结果表明, KA-BERT模型的准确率、召回率和 F1 值均有明显的提升,证明了该模型的有效性和通 用性。

关键词: 药物相互作用, 类别关键词, 注意力机制

Abstract: A drug interaction extraction model combining category key words with attention mechanism was proposed to enhance the discrimination among different categories of data and improve the performance of classifier. Firstly,the keywords of each class were selected based on the chi-square test and document frequency. Then,the position coding of keywords and drug pairs was added into the pre-trained model BERT,in order to make the difference of the samples more salient. The distribution information of keywords and other words in the sentence was learned through the attention mechanism to improve the performance of the model. Aiming at the problem of too much negative samples in the drug interaction extraction experiment,a negative sample filtering method based on rules and patterns was proposed to effectively reduce the proportion of positive and negative samples. Compared with other DDI models based on CNN,LSTM,and BERT,KA-BERT model can better improve performance on DDI data,which proves the effectiveness of KA-BERT model. The results of the test on chemical protein relation extraction show that the precision,recall and F1 score of KA-BERT model are enhanced significantly,which further proves the validity and universality of KA-BERT model.

Key words: drug interaction, category keywords, attention mechanism