Journal of South China University of Technology (Natural Science Edition) ›› 2010, Vol. 38 ›› Issue (7): 50-55.doi: 10.3969/j.issn.1000-565X.2010.07.009

• Computer Science & Technology • Previous Articles     Next Articles

Automatic Text Summarization Based on Thematic Word Weight and Sentence Features

Jiang Chang-jin1  Peng Hong1  Chen Jian-chao2  Ma Qian-li 1   

  1. 1. School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006,Guangdong,China; 2. School of Mathematics and Computational Science,Guangdong University of Business Studies,Guangzhou 510320,Guangdong,China
  • Received:2009-12-04 Revised:2010-02-28 Online:2010-07-25 Published:2010-07-25
  • Contact: 蒋昌金(1972-),男,博士生,主要从事自然语言处理、人工智能、智能计算等研究. E-mail:jiangchangjin@163.com
  • About author:蒋昌金(1972-),男,博士生,主要从事自然语言处理、人工智能、智能计算等研究.
  • Supported by:

    广东省自然科学基金资助项目(07006474); 广东省科技攻关项目(2007B010200044)

Abstract:

In order to generate high-quality automatic text summarization,a formula based on the combined word recognition algorithm is presented to compute the weight of words in a text,with the word frequency,part of speech,word position and word length being considered. By using the proposed formula,a thematic word/phrase is assigned great weight,a sentence is weighted according to its content and position,the cue words in it and the user's preference,and the final summarization is generated by fully considering the similarity of candidate sentences,thus avoiding the information redundance. Moreover,the evaluation approach based on the accuracy and the recall of summerization is improved to increase the computing precision of summarization to the word level instead of the sentence level. Experimental results show that the proposed algorithm generates high-quality summaries,with an average precision of 77. 1% .

Key words: thematic word, automatic text summarization, combined word, weight computing, sentence feature