Journal of South China University of Technology (Natural Science Edition) ›› 2022, Vol. 50 ›› Issue (1): 80-90.doi: 10.12141/j.issn.1000-565X.210028

Special Issue: 2022年计算机科学与技术

• Computer Science & Technology • Previous Articles     Next Articles

Handwritten Text Segmentation Method Based on Greedy Snake Algorithm and Radical Recognition

FU Pengbin DONG Aojing YANG Huirong   

  1. Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China
  • Received:2021-01-18 Revised:2021-05-24 Online:2022-01-25 Published:2022-01-03
  • Contact: 杨惠荣(1971-),女,博士,工程师,主要从事智能信息系统研究。 E-mail:yanghuirong@bjut.edu.cn
  • About author:付鹏斌(1967-),男,硕士,副教授,主要从事图形图像处理、模式识别等研究。E-mail:fupengbin@bjut.edu.cn
  • Supported by:
    Supported by the National Natural Science Foundation of China (61772048) and the Natural Science Foundation of Beijing(4153058)

Abstract: A segmentation method based on greedy snake algorithm and radical recognition was proposed to solve the problems of interlacing, adhesion and over-segmentation of Chinese handwritten text. Firstly, the original text segmentation trajectory was established based on the greedy snake algorithm, and the segmentation path was optimized according to the multiple rules. Then, candidate adhesion points were extracted based on the outline and skeleton of adhesion characters, and the gluttonous snake algorithm was used for secondary segmentation. Finally, the radical extraction and recognition of the over-segmentation characters was carried out, and the merging direction was determined based on the structure of Chinese characters. Combined with geometric confidence and recognition confidence, the merging of the over-segmentation characters was completed, and the correct text segmentation result was finally obtained. The effectiveness of the algorithm was verified by the experiment on 1542 lines of handwritten text from a high school test papers of Shaanxi province. The result shows that the accuracy of the segmentation algorithm can reach 82.15%.

Key words: handwritten Chinese text, adherent character, greedy snake, over-segmentation merge, radical recognition, stroke extraction

CLC Number: