关系抽取是自然语言处理领域的一项基础研究,抽取的结果可以用于知识图谱构建、人机问答、语义搜索等下游任务,具有广泛的应用场景和重要的研究价值。近年来,关系抽取取得了丰富的成果,但绝大多数研究局限于句子级关系抽取。研究表明,大量的关系无法通过单个句子提取,随着深度学习和自然语言处理技术的不断发展,文档级关系抽取研究工作迎来了新一轮的机遇和挑战。着重对近几年文档级关系抽取的研究进展进行分类和梳理,提炼出文档级关系抽取的一般技术路线图,分析文档级关系抽取研究的特征编码及特征聚合方法,同时介绍常用文档级关系抽取数据集和评测指标,并对未来的研究趋势进行展望。
Relation extraction (RE) is one of the most important tasks in information extraction of NLP, the result of RE can be used to downstream missions such as construction of knowledge graphs, knowledge base question answering, semantic search et al. which means RE has wide-ranging application scenarios and important research value. Recent years, RE achieves frutiful results, but most of them are limited in sentence-level RE, which focus on extract relation between two mentions within a single sentence. Reserches shows that a large number of relations can’t extract from a single sentence, in rencent years, document-level RE faces new opportunities and challenges with the development of deep learning and NLP. This study reviews the recent advances in document-level RE research, summarize a general technology roadmap of this task, and then analyzes the encoding and aggregation methods used in the researches, We also introduce the common datasets and evaluation metrics of this task. This paper ends up with forecasting the future development trend of this task.