华南理工大学学报(自然科学版) ›› 2008, Vol. 36 ›› Issue (9): 43-47,70.

• 计算机科学与技术 • 上一篇    下一篇

用于多文档文摘句排序的改进MO算法

蒋效宇 樊孝忠 陈康   

  1. 北京理工大学 计算机科学技术学院, 北京 100081
  • 收稿日期:2007-07-20 修回日期:2007-09-04 出版日期:2008-09-25 发布日期:2008-09-25
  • 通信作者: 蒋效宇(1979-),男,博士生,讲师,主要从事自然语言处理研究. E-mail:jxy7334@sina.com
  • 作者简介:蒋效宇(1979-),男,博士生,讲师,主要从事自然语言处理研究.
  • 基金资助:

    教育部高等学校博士学科点专项科研项目(20050007023)

Improved Majority Ordering Algorithm of Multi-Document Summarization Sentence

Jiang Xiao-yu  Fan Xiao-zhong  Chen Kang   

  1. School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
  • Received:2007-07-20 Revised:2007-09-04 Online:2008-09-25 Published:2008-09-25
  • Contact: 蒋效宇(1979-),男,博士生,讲师,主要从事自然语言处理研究. E-mail:jxy7334@sina.com
  • About author:蒋效宇(1979-),男,博士生,讲师,主要从事自然语言处理研究.
  • Supported by:

    教育部高等学校博士学科点专项科研项目(20050007023)

摘要: 针对CO和MO文摘句排序算法的缺陷,提出了一种将局部主题间的内聚度与MO算法相结合进行文摘句排序的新方法.在统计局部主题间相对位置的基础上,建立它们之间的关系有向图并计算其内聚度;排序过程中每从有向图中输出一个顶点,便从剩余顶点中查找与其具有最大内聚度的顶点,若该内聚度大于阈值,则将这两个顶点所代表的局部主题文摘句置于摘要中相邻的位置.实验结果表明,该算法排序生成的文摘更具连贯性和可读性.

关键词: 人工智能, 多文档文摘, 局部主题, 句子排序

Abstract:

In order to overcome the shortcomings of the Chronological Ordering and the Majority Ordering methods for summarization sentences, a new ordering algorithm that combines the mutual cohesion among themes and the Majority Ordering method is proposed. Based on the statistical data about the relative position in each pair of themes, a directed graph of the themes is built and the corresponding mutual cohesion is computed. In the ordering process, when a vertex is output from the directed graph, the vertex possessing the greatest cohesion with the vertex is searched from the remaining vertexes. If the cohesion is bigger than the threshold value, the sentences from the two themes corresponding to the two above-mentioned vertexes are placed on adjacent locations in the summarization. Experimental results show that the summarization generated by the proposed ordering algorithm is more coherent and readable.

Key words: artificial intelligence, multi-document summarization, local topic, sentence ordering