Journal of South China University of Technology (Natural Science Edition) ›› 2020, Vol. 48 ›› Issue (6): 123-133.doi: 10.12141/j.issn.1000-565X.190812

• Computer Science & Technology • Previous Articles     Next Articles

MSER and CNN-Based Method for Character Detection in Ancient Yi Books

CHEN Shanxiong1 HAN Xu1 LIN Xiaoyu1 LIU Yu2 WANG Minggui2   

  1. 1. College of Computer & Information Science,Southwest University,Chongqing 400715,China;2. Research Institute of Yi Nationality Studies,Guizhou University of Engineering Science,Bijie 551700,Guizhou,China
  • Received:2019-11-11 Revised:2020-01-20 Online:2020-06-25 Published:2020-06-01
  • Contact: 陈善雄(1981-),男,博士,副教授,主要从事模式识别、文档分析等研究。 E-mail:csxpml@163.com
  • About author:陈善雄(1981-),男,博士,副教授,主要从事模式识别、文档分析等研究。
  • Supported by:
    Supported by the National Natural Science Foundation of China (61872299),China Postdoctoral Science Foundation (Xm2016041) and the Natural Science Foundation of Chongqing (cstc2019jcyj-msxm2550)

Abstract: The detection of Yi character is the basis for the recognition of ancient Yi character. The detection preci-sion directly affects the accuracy of recognition. Due to the fact that the ancient Yi books have complex layouts,non-normative typesetting,and mixed text and graphics,a character detection method for ancient Yi books based on maximally stable extremal regions (MSER) and convolutional neural network (CNN) was proposed. Firstly,the scanned images of ancient Yi books with non-local mean filtering were preprocessed. Secondly,the binary image result was obtained by an improved method of local adaptive binarization. Then,non-text areas were removed by a-dopting the method based on heuristic rules. Finally,a combining method of MSER and CNN was used to detect single character. The experimental results show that the proposed approach can effectively separate the text and non-text areas,and achieves high accuracy and recall rate in single character detection experiments. And it effec-tively solves the problem of character detection in character recognition of ancient books.

Key words: ancient Yi books, character detection, binarization, maximally stable extremal region, convolutional neural network