计算机科学与技术

基于 MSER 和 CNN 的彝文古籍文献的字符检测方法

展开
  • 1. 西南大学 计算机与信息科学学院,重庆 400715; 2. 贵州工程应用技术学院 彝学研究院,贵州 毕节 551700
陈善雄(1981-),男,博士,副教授,主要从事模式识别、文档分析等研究。

收稿日期: 2019-11-11

  修回日期: 2020-01-20

  网络出版日期: 2020-06-01

基金资助

国家自然科学基金资助项目 (61872299); 中国博士后基金资助项目 (Xm2016041); 重庆市自然科学基金资助项目 (cstc2019jcyj-msxm2550); 模式识别国家重点实验室开放课题 (201900010); 西南大学中央高校基本科研业务费专项资金资助项目 (XDJK2018B020); 重庆市教委科研项目 (KJQN201801901)

MSER and CNN-Based Method for Character Detection in Ancient Yi Books

Expand
  • 1. College of Computer & Information Science,Southwest University,Chongqing 400715,China;2. Research Institute of Yi Nationality Studies,Guizhou University of Engineering Science,Bijie 551700,Guizhou,China
陈善雄(1981-),男,博士,副教授,主要从事模式识别、文档分析等研究。

Received date: 2019-11-11

  Revised date: 2020-01-20

  Online published: 2020-06-01

Supported by

Supported by the National Natural Science Foundation of China (61872299),China Postdoctoral Science Foundation (Xm2016041) and the Natural Science Foundation of Chongqing (cstc2019jcyj-msxm2550)

摘要

彝文古籍中字符的检测是古彝文字符识别的重要基础,检测的准确性直接影响着古彝文识别的精准程度。针对彝文古籍文献版面结构复杂、排版缺乏规范、存在图文混排等情况,提出一种基于最大极值稳定区域 (MSER) 和卷积神经网络 (CNN) 的彝文古籍文献字符检测方法。首先对彝文古籍扫描图片用非局部均值滤波进行了预处理,然后采用一种改进的局部自适应二值化方法得到二值图像,实现对图像的前景和背景的分割; 再采用基于启发式规则的方法对非文本区域进行去除,从而得到文本区域; 最后采用 MSER 和 CNN 相结合的方法对古籍中的单个字符进行检测。实验结果表明,该方法对古籍中文本和非文本区域进行了有效的分离,并在单字符检测实验中取得了较高的准确率和召回率,能有效地解决古籍文献字符识别中的字符检测问题。

本文引用格式

陈善雄, 韩旭, 林小渝, 等 . 基于 MSER 和 CNN 的彝文古籍文献的字符检测方法[J]. 华南理工大学学报(自然科学版), 2020 , 48(6) : 123 -133 . DOI: 10.12141/j.issn.1000-565X.190812

Abstract

The detection of Yi character is the basis for the recognition of ancient Yi character. The detection preci-sion directly affects the accuracy of recognition. Due to the fact that the ancient Yi books have complex layouts,non-normative typesetting,and mixed text and graphics,a character detection method for ancient Yi books based on maximally stable extremal regions (MSER) and convolutional neural network (CNN) was proposed. Firstly,the scanned images of ancient Yi books with non-local mean filtering were preprocessed. Secondly,the binary image result was obtained by an improved method of local adaptive binarization. Then,non-text areas were removed by a-dopting the method based on heuristic rules. Finally,a combining method of MSER and CNN was used to detect single character. The experimental results show that the proposed approach can effectively separate the text and non-text areas,and achieves high accuracy and recall rate in single character detection experiments. And it effec-tively solves the problem of character detection in character recognition of ancient books.
文章导航

/