华南理工大学学报(自然科学版) ›› 2024, Vol. 52 ›› Issue (2): 23-31.doi: 10.12141/j.issn.1000-565X.230034

• 计算机科学与技术 • 上一篇    下一篇

基于多尺度特征融合的互学习脱机手写数学公式识别

付鹏斌 徐宇 杨惠荣   

  1. 北京工业大学 信息学部,北京 100124
  • 收稿日期:2023-02-06 出版日期:2024-02-25 发布日期:2023-04-21
  • 通信作者: 杨惠荣(1971-),女,博士,工程师,主要从事智能信息系统研究。 E-mail:yanghuirong@bjut.edu.cn
  • 作者简介:付鹏斌(1967-),男,副教授,主要从事图形图像处理、模式识别等研究。E-mail:fupengbin@bjut.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(61772048);北京市自然科学基金资助项目(4153058);北京市教委优质本科教材课件建设项目(040000514122506)

Mutual Learning Offline Handwritten Mathematical Expression Recognition Based on Multi-Scale Feature Fusion

FU Pengbin XU Yu YANG Huirong   

  1. Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China
  • Received:2023-02-06 Online:2024-02-25 Published:2023-04-21
  • Contact: 杨惠荣(1971-),女,博士,工程师,主要从事智能信息系统研究。 E-mail:yanghuirong@bjut.edu.cn
  • About author:付鹏斌(1967-),男,副教授,主要从事图形图像处理、模式识别等研究。E-mail:fupengbin@bjut.edu.cn
  • Supported by:
    the National Natural Science Foundation of China(61772048);the Natural Science Foundation of Beijing(4153058);the Construction of High Quality Undergraduate Courseware for Beijing Education Commission(040000514122506)

摘要:

脱机手写数学公式二维结构复杂,其中字符多变的尺度以及书写风格的变换不一都会增大手写数学公式识别的难度。文中提出了一个基于多尺度特征融合的互学习模型。首先,在编码阶段引入了多尺度特征融合的方式改进模型,以提升模型对公式中细粒度信息的提取能力以及加强对全局二维结构的语义信息理解;其次,引入了成对的手写体、打印体数据来进行互学习模型的训练,该模型包括解码器损失和上下文匹配损失,分别学习LaTeX语法以及手写体、打印体之间的语义不变性,提高模型对不同书写风格的鲁棒性,提升对公式整体信息的理解能力。在CROHME 2014/2016/2019数据集上进行实验验证,结果发现:引入多尺度特征融合机制后,表达式正确率分别达到55.25%、52.31%、53.72%;引入互学习机制后,表达式正确率分别达到55.43%、53.53%、53.79%;同时引入两种机制后,表达式正确率分别达到58.88%、55.10%、57.05%。经实验证明,文中提出的方法能够有效提取公式中不同尺度下的特征,并通过互学习机制克服手写风格不一、数据量少等问题。此外,在HME100K数据集上的实验结果也验证了文中提出模型的有效性。

关键词: 手写数学公式识别, 脱机模式, 手写体, 打印体, 语义不变性

Abstract:

With complex two-dimensional structure, offline handwritten mathematical expressions is difficult to recognize due to the variable scale of their symbols and the various transformation of their writing styles. This paper proposed a mutual learning model based on multi-scale feature fusion. Firstly, to enhance the model for extracting fine-grained information from expressions and comprehending semantic information of global two-dimensional structures, multi-scale feature fusion was introduced in the encoding stage. Secondly, paired handwritten and printed mathematical expressions were introduced for training the mutual learning model, which includes decoder loss and context matching loss to learn LaTeX grammar as well as semantic invariance between handwritten and printed mathematical expressions respectively to improve the robustness of the model to different writing styles. Experimental validation was performed on the CROHME 2014/2016/2019 dataset. After introducing the multi-scale feature fusion mechanism, the expression correctness rate reaches 55.25%, 52.31%, 53.72%, respectively. After introducing the mutual learning mechanism, the expression correct rate reaches 55.43%, 53.53%, 53.79%, respectively. The expression correctness rate reaches 58.88%, 55.10%, 57.05% after introducing both mechanisms at the same time. It is proved experimentally that the proposed method can effectively extract the features in formulas at different scales and overcome the problems of different handwriting styles and small amount of data by mutual learning mechanism. In addition, the experimental results on the HME100K dataset verified the effectiveness of the proposed model.

Key words: handwritten mathematical expression recognition, offline model, handwritten MEs, printed MEs, semantic invariance

中图分类号: