Journal of South China University of Technology(Natural Science Edition) ›› 2022, Vol. 50 ›› Issue (12): 80-88.doi: 10.12141/j.issn.1000-565X.220013

Special Issue: 2022年计算机科学与技术

• Computer Science & Technology • Previous Articles     Next Articles

BiLSTM-BiDAF Named Entity Recognition Based on Machine Reading Comprehension

WANG Jie XIA Xiaoming    

  1. Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China
  • Received:2022-01-07 Online:2022-12-25 Published:2022-04-22
  • Contact: 王洁(1972-),女,博士,副教授,主要从事人工智能、自然语言处理研究。 E-mail:wj@bjut.edu.cn
  • About author:王洁(1972-),女,博士,副教授,主要从事人工智能、自然语言处理研究。
  • Supported by:
    the National Natural Science Foundation of China(61876010)

Abstract:

Named entity recognition is a fundamental task of natural language processing (NLP) and plays an important role in many downstream NLP tasks, including information extraction and machine translation, etc. The existing named entity recognition methods are usually based on sequence labeling and extract entities within a sentence independently. These methods ignore the semantic information between sentences. Named entity recognition methods based on machine reading comprehension encode important prior information about the entity class. It is easier to distinguish similar classification labels, which reduces the difficulty of model learning, but it still only models at the sentence level, ignoring the semantic information between sentences, which is easy to cause the problem of inconsistent entity labeling in different sentences. To this end, this paper extended the sentence-level named entity recognition to the text-level named entity recognition, and then proposed a BiLSTM-BiDAF named entity recognition model based on machine reading comprehension. First, to utilize the context information within the whole text, NEZHA pre-training language model was used to obtain information of the full text and local features were further captured through BiLSTM, so as to strengthen the model’s ability to capture locally dependent information. Then, a bidirectional attention flow was introduce to learn the semantic association between the text and entity category. Finally, to predict the position of entities in the text, a boundary detector based on the gating mechanism was design to strengthen the correlation of the entity boundary. At the same time, an answer count detector was establish to identify the unanswerable questions. Experimental results on the CCKS2020 Chinese electronic medical records dataset and CMeEE dataset show that our model can effectively identify document-level and sentence-level named entities, and F1 can reach 84.76% and 57.35%, respectively.

Key words: bidirectional attention flow, bidirectional long short-term memory, named entity recognition, machine reading comprehension, natural language processing

CLC Number: