基于多尺度特征融合的互学习脱机手写数学公式识别

doi:10.12141/j.issn.1000-565X.230034

华南理工大学学报(自然科学版) ›› 2024, Vol. 52 ›› Issue (2): 23-31.doi: 10.12141/j.issn.1000-565X.230034

基于多尺度特征融合的互学习脱机手写数学公式识别

付鹏斌徐宇杨惠荣

北京工业大学信息学部，北京 100124

收稿日期:2023-02-06 出版日期:2024-02-25 发布日期:2023-04-21
通信作者: 杨惠荣（1971-），女，博士，工程师，主要从事智能信息系统研究。 E-mail:yanghuirong@bjut.edu.cn
作者简介:付鹏斌（1967-），男，副教授，主要从事图形图像处理、模式识别等研究。E-mail:fupengbin@bjut.edu.cn
基金资助:
国家自然科学基金资助项目(61772048);北京市自然科学基金资助项目(4153058);北京市教委优质本科教材课件建设项目(040000514122506)

Mutual Learning Offline Handwritten Mathematical Expression Recognition Based on Multi-Scale Feature Fusion

FU Pengbin XU Yu YANG Huirong

Faculty of Information Technology，Beijing University of Technology，Beijing 100124，China

Received:2023-02-06 Online:2024-02-25 Published:2023-04-21
Contact: 杨惠荣（1971-），女，博士，工程师，主要从事智能信息系统研究。 E-mail:yanghuirong@bjut.edu.cn
About author:付鹏斌（1967-），男，副教授，主要从事图形图像处理、模式识别等研究。E-mail:fupengbin@bjut.edu.cn
Supported by:
the National Natural Science Foundation of China(61772048);the Natural Science Foundation of Beijing(4153058);the Construction of High Quality Undergraduate Courseware for Beijing Education Commission(040000514122506)

摘要/Abstract

摘要：

脱机手写数学公式二维结构复杂，其中字符多变的尺度以及书写风格的变换不一都会增大手写数学公式识别的难度。文中提出了一个基于多尺度特征融合的互学习模型。首先，在编码阶段引入了多尺度特征融合的方式改进模型，以提升模型对公式中细粒度信息的提取能力以及加强对全局二维结构的语义信息理解；其次，引入了成对的手写体、打印体数据来进行互学习模型的训练，该模型包括解码器损失和上下文匹配损失，分别学习LaTeX语法以及手写体、打印体之间的语义不变性，提高模型对不同书写风格的鲁棒性，提升对公式整体信息的理解能力。在CROHME 2014/2016/2019数据集上进行实验验证，结果发现：引入多尺度特征融合机制后，表达式正确率分别达到55.25%、52.31%、53.72%；引入互学习机制后，表达式正确率分别达到55.43%、53.53%、53.79%；同时引入两种机制后，表达式正确率分别达到58.88%、55.10%、57.05%。经实验证明，文中提出的方法能够有效提取公式中不同尺度下的特征，并通过互学习机制克服手写风格不一、数据量少等问题。此外，在HME100K数据集上的实验结果也验证了文中提出模型的有效性。

关键词: 手写数学公式识别, 脱机模式, 手写体, 打印体, 语义不变性

Abstract:

With complex two-dimensional structure, offline handwritten mathematical expressions is difficult to recognize due to the variable scale of their symbols and the various transformation of their writing styles. This paper proposed a mutual learning model based on multi-scale feature fusion. Firstly, to enhance the model for extracting fine-grained information from expressions and comprehending semantic information of global two-dimensional structures, multi-scale feature fusion was introduced in the encoding stage. Secondly, paired handwritten and printed mathematical expressions were introduced for training the mutual learning model, which includes decoder loss and context matching loss to learn LaTeX grammar as well as semantic invariance between handwritten and printed mathematical expressions respectively to improve the robustness of the model to different writing styles. Experimental validation was performed on the CROHME 2014/2016/2019 dataset. After introducing the multi-scale feature fusion mechanism, the expression correctness rate reaches 55.25%, 52.31%, 53.72%, respectively. After introducing the mutual learning mechanism, the expression correct rate reaches 55.43%, 53.53%, 53.79%, respectively. The expression correctness rate reaches 58.88%, 55.10%, 57.05% after introducing both mechanisms at the same time. It is proved experimentally that the proposed method can effectively extract the features in formulas at different scales and overcome the problems of different handwriting styles and small amount of data by mutual learning mechanism. In addition, the experimental results on the HME100K dataset verified the effectiveness of the proposed model.

Key words: handwritten mathematical expression recognition, offline model, handwritten MEs, printed MEs, semantic invariance

中图分类号:

TP391

付鹏斌, 徐宇, 杨惠荣. 基于多尺度特征融合的互学习脱机手写数学公式识别[J]. 华南理工大学学报(自然科学版), 2024, 52(2): 23-31.

FU Pengbin, XU Yu, YANG Huirong. Mutual Learning Offline Handwritten Mathematical Expression Recognition Based on Multi-Scale Feature Fusion[J]. Journal of South China University of Technology(Natural Science Edition), 2024, 52(2): 23-31.

图/表 13

图1

图2

图3

图4

表1

表2

图5

表3

表4

表5

表6

表7

表8

参考文献 26

1	MOUCHERE H， GAUDIN C V， ZANIBBI R，et al ．ICFHR 2016 CROHME：competition on recognition of online handwritten mathematical expressions［C］∥Proceedings of the 2016 15th International Conference on Frontiers in Handwriting Recognition （ICFHR）．Shenzhen：IEEE，2017：607-612.
2	靳简明，江红英，王庆人．数学公式图像处理综述［J］．模式识别与人工智能，2005，18（4）：429-440.
	JIN Jian-ming， JIANG Hong-ying， WANG Qing-ren ．Survey of mathematical expression image processing［J］．Pattern Recognition and Artificial Intelligence，2005，18（4）：429-440.
3	SIMISTIRA F， PAPAVASSILIOU V， KATSOUROS V，et al ．Recognition of spatial relations in mathematical formulas［C］∥Proceedings of the 2014 14th International Conference on Frontiers in Handwriting Recognition （ICFHR）．Hersonissos：IEEE，2014：164-168.
4	NAZEMI A， TAVAKOLIAN N， FITZPATRICK D，et al ．Offline handwritten mathematical symbol recognition utilising deep learning ［EB/OL］．（2019-10-22）［2023-01-09］．.
5	LODS A， ANQUETIL E， MACE S ．Fuzzy visibility graph for structural analysis of online handwritten mathematical expressions［C］∥Proceedings of the 2019 International Conference on Document Analysis and Recognition （ICDAR）．Sydney：IEEE，2019：641-646.
6	LAVANYA K， BAJAJ S， TANK P，et al ．Handwritten digit recognition using hoeffding tree，decision tree and random forests—a comparative approach［C］∥Proceedings of the 2017 International Conference on Computational Intelligence in Data Science （ICCIDS）．Chennai：IEEE，2017：1-6.
7	ALTAN A， KARASU S，ZIO E ．A new hybrid model for wind speed forecasting combining long short-term memory neural network，decomposition methods and grey wolf optimizer［J］．Applied Soft Computing，2021，106996/1-20.
8	陈路，陈道喜，陆一鸣，等．基于注意力机制编码器-解码器的手写数学公式识别模型［J］．计算机应用，2023，43（4）：1297-1302.
	CHEN Lu， CHEN Daoxi， LU Yiming，et al ．Handwritten mathematical expression recognition model based on attention mechanism and encoder-decoder［J］．Journal of Computer Applications，2023，43（4）：1297-1302.
9	ZHANG J， DU J， ZHANG S L，et al ．Watch，attend and parse：an end-to-end neural network based approach to handwritten mathematical expression recognition ［J］．Pattern Recognition，2017，71：196-206.
10	ZHANG J S， DU J， DAI L R ．A GRU-based encoder-decoder approach with attention for online handwritten mathematical expression recognition［C］∥Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition （ICDAR）．Kyoto：IEEE，2017：902-907.
11	ZHANG J S， DU J， DAI L R ．Multi-scale attention with dense encoder for handwritten mathematical expression recognition［C］∥Proceedings of the 2018 24th International Conference on Pattern Recognition （ICPR）．Beijing：IEEE，2018：2245-2250.
12	WU J W， YIN F， ZHANG Y M，et al ．Image-to-markup generation via paired adversarial learning ［C］∥Proceedings of the Machine Learning and Knowledge Discovery in Databases．Cham：Springer，2018：18-34.
13	WU J W， YIN F， ZHANG Y M，et al ．Handwritten mathematical expression recognition via paired adversarial learning［J］．International Journal of Computer Vision，2020，128：2386-2401.
14	LE A D ．Recognizing handwritten mathematical expressions via paired dual loss attention network and printed mathematical expressions［C］∥Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW）．Seattle：IEEE，2020：2413-2418.
15	ZHAO W Q， GAO L C， YAN Z Y，et al ．Handwritten mathematical expression recognition with bidirectionally trained transformer［C］∥Proceedings of the Document Analysis and Recognition-ICDAR 2021．Cham：Springer，2021：570-584.
16	BIAN X H， QIN B， XIN X Z，et al ．Handwritten mathematical expression recognition via attention aggregation based bi-directional mutual learning［EB/OL］．（2022-09-04）［2023-01-03］．.
17	ZHANG Y， XIANG T， HOSPEDALES T M，et al ．Deep mutual learning［C］∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition （CVPR）．New York：IEEE，2018：4320-4328.
18	付鹏斌，李建君，杨惠荣．基于粘连符号分割和多特征融合的手写公式识别［J］．北京工业大学学报，2021，47（8）：842-853.
	FU Pengbin， LI Jianjun， YANG Huirong. Handwritten formula recognition based on segmentation of adhesive symbols and multi-feature fusion［J］．Journal of Beijing University of Technology，2021，47（8）：842-853.
19	HUANG G， LIU Z， MAATEN V，et al ．Densely connected convolutional networks［C］∥Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）．Honolulu：IEEE，2017：2261-2269.
20	VASWANI A， SHAZEER N， PARMAR N，et al ．Attention is all you need［EB/OL］．（2021-01-23）［2023-01-16］．.
21	ZHAO W Q， GAO L C. CoMER：modeling coverage for transformer-based handwritten mathematical expression recognition ［EB/OL］．（2022-07-13）［2023-01-15］．.
22	CARION N， MASSA F， SYNNAEVE G，et al ．End-to-end object detection with transformers［C］∥Proceedings of the 16th European Conference on Computer Vision．Glasgow：Springer，2020：213-229.
23	DENG Y T， KANERVISTO A， LING J，et al ．Image-to-markup generation with coarse-to-fine attention［C］∥Proceedings of the 34th International Conference on Machine Learning．［S.l.］：JMLR，2016：980-989.
24	HINTON G， VINYALS O， DEAN J ．Distilling the knowledge in a neural network［EB/OL］．（2018-08-13）［2023-01-15］．.
25	ZHANG J S， DU J， YANG Y X，et al ．A tree-structured decoder for image-to-markup generation［C］∥Proceedings of the International Conference on Machine Learning （ICML）．［S.l.］：PMLR，2020：11076-11085.
26	YUAN Y， LIU X， DIKUBAB W，et al ．Syntax-aware network for handwritten mathematical expression recognition［EB/OL］．（2022-06-18）［2023-02-01］．.

数据集		公式数量/个
训练集	测试集	公式数量/个
CROHME 2014数据集		8 835
	CROHME 2014测试集	986
	CROHME 2016测试集	1 147
	CROHME 2019测试集	1 199

数据集	公式数量/个			公式总数/个
数据集	简单	中等	困难	公式总数/个
HME100K	7 721	10 450	6 436	24 607
HME100K-sub	6 155	7 026	4 346	17 527

模型	A_cc/%
模型	CROHME 2014	CROHME 2016	CROHME 2019
BTTR	53.96	52.31	52.96
BTTR-MSLoss	55.84	51.44	52.29
BTTR-MS	55.25	52.31	53.72

公式图片	BTTR模型	BTTR-MS模型
	x - y	x. y
	3_｛ u ｝， 5 u _ ｛ 6 ｝ \ldots	3， 4， 5， 6， \ldots
	c 0 5 2 \alpha	\cos 2 \alpha
	4 X 4 + 4 + 4	4 \times 4 + 4 + 4
	\lim _ ｛ z \rightarrow z _ ｛ 0 ｝ f （ z ）	\lim _ ｛ z \rightarrow z _ ｛ 0 ｝｝ f （ z ）

模型	A_cc/%
模型	CROHME 2014	CROHME 2016	CROHME 2019
Dual Loss	51.88	51.53
BTTR	53.96	52.31	52.96
BTTR-MSE	54.52	53.71	53.21
BTTR-KL	55.43	53.53	53.79

基于多尺度特征融合的互学习脱机手写数学公式识别

Mutual Learning Offline Handwritten Mathematical Expression Recognition Based on Multi-Scale Feature Fusion

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 26

相关文章 2

编辑推荐

Metrics

本文评价

数据集	模型	A_cc	E_rr1	E_rr2
CROHME2014	WAP	46.55	61.15	65.21
	DWAP	50.10
	DWAP-MSA	52.80	68.10	72.00
	DWAP-TD	49.10	64.20	67.80
	PAL	39.66	56.80	65.11
	PAL-v2	48.88	64.50	69.78
	Dual Loss	51.88
	BTTR	53.96	66.02	70.28
	ABM	56.85	73.73	81.24
	SAN	56.20	72.60	79.20
	Ours	56.36	72.89	81.12
	Ours*	58.88	74.92	84.14
CROHME2016	WAP	44.55	57.10	61.55
	DWAP	47.50
	DWAP-MSA	50.10	63.80	67.40
	DWAP-TD	48.50	62.30	65.30
	PAL-v2	49.61	64.08	70.27
	Dual Loss	51.53
	BTTR	52.31	63.90	68.61
	ABM	52.92	69.66	78.73
	SAN	53.60	69.60	76.80
	Ours	53.88	71.40	79.77
	Ours*	55.10	69.83	78.64
CROHME2019	DWAP-TD	51.40	66.10	69.10
	BTTR	52.96	65.97	69.14
	ABM	53.96	71.06	78.65
	SAN	53.50	69.30	70.10
	Ours	55.21	73.23	80.82
	Ours*	57.05	73.39	79.89

数据集复杂度	A_cc	E_rr1	E_rr2	W_er
简单	75.30	90.51	94.36	4.69
中等	61.81	80.65	88.99	4.57
困难	43.62	62.64	71.60	7.63

[1]	付鹏斌董澳静杨惠荣. 基于贪吃蛇算法和部首识别的手写文本切分[J]. 华南理工大学学报（自然科学版）, 2022, 50(1): 80-90.
[2]	高学, 金连文, 尹俊勋, 等. 基于笔划的手写体汉字方向分解特征提取方法[J]. 华南理工大学学报(自然科学版), 2003, 31(3): 11-14.