华南理工大学学报(自然科学版) ›› 2022, Vol. 50 ›› Issue (1): 80-90.doi: 10.12141/j.issn.1000-565X.210028

所属专题: 2022年计算机科学与技术

• 计算机科学与技术 • 上一篇    下一篇

基于贪吃蛇算法和部首识别的手写文本切分

付鹏斌 董澳静 杨惠荣   

  1. 北京工业大学 信息学部,北京 100124
  • 收稿日期:2021-01-18 修回日期:2021-05-24 出版日期:2022-01-25 发布日期:2022-01-03
  • 通信作者: 杨惠荣(1971-),女,博士,工程师,主要从事智能信息系统研究。 E-mail:yanghuirong@bjut.edu.cn
  • 作者简介:付鹏斌(1967-),男,硕士,副教授,主要从事图形图像处理、模式识别等研究。E-mail:fupengbin@bjut.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(61772048);北京市自然科学基金资助项目(4153058);北京市教委教学改革创新项目 (040000514120521)

Handwritten Text Segmentation Method Based on Greedy Snake Algorithm and Radical Recognition

FU Pengbin DONG Aojing YANG Huirong   

  1. Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China
  • Received:2021-01-18 Revised:2021-05-24 Online:2022-01-25 Published:2022-01-03
  • Contact: 杨惠荣(1971-),女,博士,工程师,主要从事智能信息系统研究。 E-mail:yanghuirong@bjut.edu.cn
  • About author:付鹏斌(1967-),男,硕士,副教授,主要从事图形图像处理、模式识别等研究。E-mail:fupengbin@bjut.edu.cn
  • Supported by:
    Supported by the National Natural Science Foundation of China (61772048) and the Natural Science Foundation of Beijing(4153058)

摘要: 针对手写中文文本交错、粘连、字内过分离等问题,提出一种基于贪吃蛇算法和部首识别的文本切分方法。首先,根据贪吃蛇算法建立文本原始切分轨迹,并依据多重规则优化切分路径;之后,基于粘连字符的轮廓和骨架提取候选粘连点,利用贪吃蛇算法进行二次切分;最后,对过切分字符,进行部首的笔段提取和识别,依据汉字结构确定合并方向,并结合几何置信度和识别置信度完成合并,得到最终正确的文本切分结果。以陕西省某高中试卷中1542行手写文本作为实验数据进行了算法验证,结果表明,该算法切分正确率可达到82.15%。

关键词: 手写体中文文本, 粘连字符, 贪吃蛇, 过切分合并, 部首识别, 笔段提取

Abstract: A segmentation method based on greedy snake algorithm and radical recognition was proposed to solve the problems of interlacing, adhesion and over-segmentation of Chinese handwritten text. Firstly, the original text segmentation trajectory was established based on the greedy snake algorithm, and the segmentation path was optimized according to the multiple rules. Then, candidate adhesion points were extracted based on the outline and skeleton of adhesion characters, and the gluttonous snake algorithm was used for secondary segmentation. Finally, the radical extraction and recognition of the over-segmentation characters was carried out, and the merging direction was determined based on the structure of Chinese characters. Combined with geometric confidence and recognition confidence, the merging of the over-segmentation characters was completed, and the correct text segmentation result was finally obtained. The effectiveness of the algorithm was verified by the experiment on 1542 lines of handwritten text from a high school test papers of Shaanxi province. The result shows that the accuracy of the segmentation algorithm can reach 82.15%.

Key words: handwritten Chinese text, adherent character, greedy snake, over-segmentation merge, radical recognition, stroke extraction

中图分类号: