华南理工大学学报(自然科学版) ›› 2022, Vol. 50 ›› Issue (4): 10-25.doi: 10.12141/j.issn.1000-565X.210152

所属专题: 2022年计算机科学与技术

• 计算机科学与技术 • 上一篇    下一篇

文档级关系抽取方法研究综述

周友华黄翰刘浩龙3 郝志峰4   

  1. 1. 华南理工大学 软件学院,广东 广州 510006; 2. 佛山科学技术学院 数学与大数据学院,广东 佛山 528225
  • 收稿日期:2021-03-21 修回日期:2021-08-09 出版日期:2022-04-25 发布日期:2021-08-27
  • 通信作者: 黄翰 (1980-),男,教授,博士生导师,主要从事智能算法理论与应用研究 E-mail:hhan@ scut. edu. cn
  • 作者简介:周友华 (1986-),男,博士生,主要从事大数据审计与知识图谱研究
  • 基金资助:
    国家自然科学基金

A Survey on Document-Level Relation Extraction

ZHOU Youhua1 HUANG Han2 LIU Haolong3 HAO Zhifeng4   

  1. 1. School of Software Engineering,South China University of Technology,Guangzhou 510006,Guangdong,China;
    2. School of Mathematics and Big Data,Foshan University,Foshan 528225,Guangdong,China
  • Received:2021-03-21 Revised:2021-08-09 Online:2022-04-25 Published:2021-08-27
  • Contact: 黄翰 (1980-),男,教授,博士生导师,主要从事智能算法理论与应用研究 E-mail:hhan@ scut. edu. cn
  • About author:周友华 (1986-),男,博士生,主要从事大数据审计与知识图谱研究
  • Supported by:
    National Natural Science Foundation of China

摘要: 关系抽取是自然语言处理领域的一项基础研究,抽取的结果可以用于知识图谱构建、人机问答、语义搜索等下游任务,具有广泛的应用场景和重要的研究价值。近年来,关系抽取取得了丰富的成果,但绝大多数研究局限于句子级关系抽取。研究表明,大量的关系无法通过单个句子提取,随着深度学习和自然语言处理技术的不断发展,文档级关系抽取研究工作迎来了新一轮的机遇和挑战。着重对近几年文档级关系抽取的研究进展进行分类和梳理,提炼出文档级关系抽取的一般技术路线图,分析文档级关系抽取研究的特征编码及特征聚合方法,同时介绍常用文档级关系抽取数据集和评测指标,并对未来的研究趋势进行展望。

关键词: 文档级别, 关系抽取, 特征编码, 特征聚合

Abstract: Relation extraction (RE) is one of the most important tasks in information extraction of NLP, the result of RE can be used to downstream missions such as construction of knowledge graphs, knowledge base question answering, semantic search et al. which means RE has wide-ranging application scenarios and important research value. Recent years, RE achieves frutiful results, but most of them are limited in sentence-level RE, which focus on extract relation between two mentions within a single sentence. Reserches shows that a large number of relations can’t extract from a single sentence, in rencent years, document-level RE faces new opportunities and challenges with the development of deep learning and NLP. This study reviews the recent advances in document-level RE research, summarize a general technology roadmap of this task, and then analyzes the encoding and aggregation methods used in the researches, We also introduce the common datasets and evaluation metrics of this task. This paper ends up with forecasting the future development trend of this task.

Key words: document-level, relation extraction, encoding, aggregation

中图分类号: