基于FPGA并行加速的脉冲神经网络在线学习硬件结构的设计与实现

doi:10.12141/j.issn.1000-565X.220623

华南理工大学学报(自然科学版) ›› 2023, Vol. 51 ›› Issue (5): 104-113.doi: 10.12141/j.issn.1000-565X.220623

所属专题： 2023年电子、通信与自动控制

• 电子、通信与自动控制 • 上一篇下一篇

基于FPGA并行加速的脉冲神经网络在线学习硬件结构的设计与实现

刘怡俊¹ 曹宇² 叶武剑¹ 林子琦²

^1.广东工业大学集成电路学院，广东广州 510006
^2.广东工业大学信息工程学院，广东广州 510006

收稿日期:2022-09-27 出版日期:2023-05-25 发布日期:2022-12-05
通信作者: 叶武剑（1987-），男，博士，讲师，主要从事类脑计算机、深度学习应用研究。 E-mail:yewjian@gdut.edu.cn
作者简介:刘怡俊（1977-），男，博士，教授，博士生导师，主要从事集成电路设计、类脑计算机、深度学习研究。E-mail:yjliu@gdut.edu.cn
基金资助:
广东省重点领域研发计划项目(2018B030338001);广州市基础研究计划基础与应用基础研究项目(202201010595);广东省教育厅创新人才项目;广东工业大学“青年百人计划”项目(220413548)

Design and Implementation of Hardware Structure for Online Learning of Spiking Neural Networks Based on FPGA Parallel Acceleration

LIU Yijun¹ CAO Yu² YE Wujian¹ LIN Ziqi²

^1.School of Integrated Circuits, Guangdong University of Technology, Guangzhou 510006, Guangdong, China
^2.School of Information Engineering, Guangdong University of Technology, Guangzhou 510006, Guangdong, China

Received:2022-09-27 Online:2023-05-25 Published:2022-12-05
Contact: 叶武剑（1987-），男，博士，讲师，主要从事类脑计算机、深度学习应用研究。 E-mail:yewjian@gdut.edu.cn
About author:刘怡俊（1977-），男，博士，教授，博士生导师，主要从事集成电路设计、类脑计算机、深度学习研究。E-mail:yjliu@gdut.edu.cn
Supported by:
the Key-Area R&D Program of Guangdong Province(2018B030338001)

摘要/Abstract

摘要：

当前，基于数字电路的脉冲神经网络硬件设计，在学习功能方面的突触并行性不高，导致硬件整体延时较大，在一定程度上限制了脉冲神经网络模型在线学习的速度。针对上述问题，文中提出了一种基于FPGA并行加速的高效脉冲神经网络在线学习硬件结构，通过神经元和突触的双并行设计对模型的训练与推理过程进行加速。首先，设计具有并行脉冲传递功能和并行脉冲时间依赖可塑性学习功能的突触结构；然后，搭建输入编码层和赢家通吃结构的学习层，并优化赢家通吃网络的侧向抑制的实现，形成规模为784~400的脉冲神经网络模型。实验结果表明：在MNIST数据集上，使用该硬件结构的脉冲神经网络模型训练一幅图像需要的时间为1.61 ms、能耗约为3.18 mJ，推理一幅图像需要的时间为1.19 ms、能耗约为2.37 mJ，识别MNIST测试集样本的准确率可达87.51%；在文中设计的硬件框架下，突触并行结构能使训练速度提升38%以上，硬件能耗降低约24.1%，有助于促进边缘智能计算设备及技术的发展。

关键词: 神经网络, 学习算法, 加速, 并行结构

Abstract:

Currently, the hardware design of spiking neural networks based on digital circuits has a low synaptic parallel nature in terms of learning function, leading to a large overall hardware delay, which limits the speed of online learning of spiking neural network models to some extent. To address the above problems, this paper proposed an efficient spiking neural network online learning hardware architecture based on FPGA parallel acceleration, which accelerates the training and inference process of the model through the dual parallel design of neurons and synapses. Firstly, a synaptic structure with parallel spike delivery function and parallel spike time-dependent plasticity learning function was designed; then, the learning layers of input encoding layer and winner-take-all structure were built, and the implementation of lateral inhibition of the winner-take-all network was optimized, forming an impulsive neural network model with a scale of 784~400. The experiments show, the hardware has a training speed of 1.61 ms/image and an energy consumption of about 3.18 mJ/image for the SNN model and an inference speed of 1.19 ms/image and an energy consumption of about 2.37 mJ/image on the MNIST dataset, with an accuracy rate of 87.51%. Based on the hardware framework designed in this paper, the synaptic parallel structure can improve the training speed by more than 38%, and reduce the hardware energy consumption by about 24.1%, which can help to promote the development of edge intelligent computing devices and technologies.

Key words: neural network, learning algorithm, acceleration, parallel architecture

中图分类号:

TP389.1

刘怡俊, 曹宇, 叶武剑, 等. 基于FPGA并行加速的脉冲神经网络在线学习硬件结构的设计与实现[J]. 华南理工大学学报(自然科学版), 2023, 51(5): 104-113.

LIU Yijun, CAO Yu, YE Wujian, et al. Design and Implementation of Hardware Structure for Online Learning of Spiking Neural Networks Based on FPGA Parallel Acceleration[J]. Journal of South China University of Technology(Natural Science Edition), 2023, 51(5): 104-113.

图/表 14

图1

图2

图3

图4

图5

图6

图7

图8

图9

图10

图11

表1

表2

表3

参考文献 16

1	STIMBERG M， BRETTE R， GOODMAN D F M ．Brian 2，an intuitive and efficient neural simulator［J］．Elife，2019，8：e47314/1-10.
2	CHOU T S， KASHYAP H J， XING J，et al ．CARLsim 4：an open source library for large scale，biologically detailed spiking neural network simulation using heterogeneous clusters［C］∥ Proceedings of 2018 International Joint Conference on Neural Networks．Rio de Janeiro：IEEE，2018：1158-1165.
3	LI S， ZHANG Z， MAO R，et al ．A fast and energy-efficient SNN processor with adaptive clock/event-driven computation scheme and online learning［J］．IEEE Transactions on Circuits and Systems I：Regular Papers，2021，68（4）：1543-1552.
4	SOLEIMANI H， AHMADI A， BAVANDPOUR M ．Biologically inspired spiking neurons：piecewise linear models and digital implementation［J］．IEEE Transactions on Circuits and Systems I：Regular Papers，2012，59（12）：2991-3004.
5	HEIDARPUR M， AHMADI A， AHMADI M，et al ．CORDIC-SNN：on-FPGA STDP learning with Izhikevich neurons［J］．IEEE Transactions on Circuits and Systems I：Regular Papers，2019，66（7）：2651-2661.
6	JOKAR E， SOLEIMANI H ．Digital multiplierless realization of a calcium-based plasticity model［J］．IEEE Transactions on Circuits and Systems Ⅱ：Express Briefs，2017，64（7）：832-836.
7	QUINTANA F M， PEREZ-PEÑA F， GALINDO P L ．Bio-plausible digital implementation of a reward modulated STDP synapse［J］．Neural Computing and Applications，2022，34：15649-15660.
8	WU J， ZHAN Y， PENG Z，et al ．Efficient design of spiking neural network with STDP learning based on fast CORDIC［J］．IEEE Transactions on Circuits and Systems I：Regular Papers，2021，68（6）：2522-2534.
9	WAN L， LUO Y， SONG S，et al ．Efficient neuron architecture for FPGA-based spiking neural networks［C］∥ Proceedings of 2016 the 27th Irish Signals and Systems Conference．Londonderry：IEEE，2016：1-6.
10	LIU Y， CHEN Y， YE W，et al ．FPGA-NHAP：a general FPGA-based neuromorphic hardware acceleration platform with high speed and low power［J］．IEEE Transactions on Circuits and Systems I：Regular Papers，2022，69（6）：2553-2566.
11	WANG Q， LI Y， SHAO B，et al ．Energy efficient parallel neuromorphic architectures with approximate arithmetic on FPGA ［J］．Neurocomputing，2017，221：146-158.
12	HE Z， SHI C， WANG T，et al ．A low-cost FPGA implementation of spiking extreme learning machine with on-chip reward-modulated STDP learning［J］．IEEE Transactions on Circuits and Systems Ⅱ：Express Briefs，2022，69（3）：1657-1661.
13	DIEHL P U， COOK M ．Unsupervised learning of digit recognition using spike-timing-dependent plasticity［J］．Frontiers in Computational Neuroscience，2015，9：99/1-9.
14	VIGNERON A， MARTINET J ．A critical survey of STDP in spiking neural networks for pattern recognition［C］∥ Proceedings of 2020 International Joint Conference on Neural Networks．Glasgow：IEEE，2020：1-9.
15	GUO W， YANTIR H E， FOUDA M E，et al ．Toward the optimal design and FPGA implementation of spiking neural networks［J］．IEEE Transactions on Neural Networks and Learning Systems，2021，33（8）：3988-4002.
16	MORRISON A， AERTSEN A， DIESMANN M ．Spike-timing-dependent plasticity in balanced random networks［J］．Neural Computation，2007，19（6）：1437-1467.

并行情况	使用数量				使用率/%				片上总功耗/W
并行情况	LUT	LUTRAM	FF	BRAM	LUT	LUTRAM	FF	BRAM	片上总功耗/W
1×8	21 335	915	26 323	266.0	10.47	1.43	6.46	59.78	1.322
2×8	26 436	946	31 685	282.0	12.97	1.48	7.77	63.37	1.636
4×8	36 536	1 018	40 690	314.0	17.93	1.59	9.98	70.56	1.986
8×8	56 842	1 279	62 547	379.5	27.89	2.00	15.35	85.28	2.906

任务	并行情况	样本数/10⁴	训练用时/s	输入脉冲数量	脉冲事件处理速度	单幅图像处理时间/ms	单幅图像处理能耗/mJ	准确率/%
在线学习	1×8	6	179.623	152 777 357	850 544.5	2.994	3.958	86.80
	2×8	6	110.189	150 372 047	1 364 673.8	1.836	3.004	87.41
	4×8	6	96.291	150 255 976	1 560 436.3	1.605	3.181	86.92
	8×8	6	90.802	146 492 996	1 613 323.4	1.513	4.398	87.51
推理	1×8	1	19.596	26 248 957	1 339 505.8	1.960	2.591	86.80
	2×8	1	12.914	20 794 115	1 610 199.3	1.291	2.113	87.41
	4×8	1	11.941	25 913 646	2 170 140.3	1.194	2.367	86.92
	8×8	1	11.714	25 262 790	2 156 632.2	1.171	3.404	87.51

硬件结构	系统时钟/MHz	数据格式	学习算法	FPGA 设备	神经元模型	神经元总数	突触数量	单幅图像处理时间/ms		单幅图像处理能耗/mJ		准确率/%
硬件结构	系统时钟/MHz	数据格式	学习算法	FPGA 设备	神经元模型	神经元总数	突触数量	在线学习	推理	在线学习	推理	准确率/%
文献［11］	120	8位固定	STDP	Virtex 6	LIF	1 591	638 208	16 800.00	8 400.00	1 330.00	1 120.00	89.10
文献［3］	100	16位浮点	STDP	Virtex 7	LIF	984	88 400	16.30	3.15	26.32	5.04	85.28
文献［15］	100	8位固定	STDP	Virtex 7	LIF	1 184	313 600	34.00	28.00	2.43	1.73	89.70
文中设计	200	16位固定	STDP	Kintex 7	LIF	1 184	313 600	1.61	1.19	3.18	2.37	87.51

[1]	蔡晓东, 周青松, 叶青. 基于动态邻域采样的社交推荐模型[J]. 华南理工大学学报(自然科学版), 2024, 52(2): 32-41.
[2]	方港, 袁珑华, 王晓明, 等. 基于集合卡尔曼-Elman网络的软测量建模方法[J]. 华南理工大学学报(自然科学版), 2023, 51(8): 126-136.
[3]	马晓亮, 安玲玲, 邓从健, 等. 基于行业词表的自动语音转写后优化技术[J]. 华南理工大学学报(自然科学版), 2023, 51(8): 118-125.
[4]	朱铮宇, 罗超, 贺前华, 等. 基于唇重构与三维耦合CNN的多视角音唇一致性判别[J]. 华南理工大学学报(自然科学版), 2023, 51(5): 70-77.
[5]	叶峰, 陈彪, 赖乙宗. 基于特征空间嵌入的对比知识蒸馏算法[J]. 华南理工大学学报(自然科学版), 2023, 51(5): 13-23.
[6]	冷晟, 付有为, 马万太, 等. 基于GA-BP神经网络的喷射成形锭坯形貌调控技术[J]. 华南理工大学学报(自然科学版), 2023, 51(2): 27-34.
[7]	翟敬梅, 路东伟. 按摩机器人优化示教策略及BPNN-DMPs轨迹学习模型[J]. 华南理工大学学报(自然科学版), 2023, 51(12): 1-8.
[8]	赵荣超, 吴百礼, 陈祝云, 温楷儒, 张绍辉, 李巍华. 多尺度时空信息融合驱动的图神经网络故障诊断方法[J]. 华南理工大学学报(自然科学版), 2023, 51(12): 42-52.
[9]	罗玉涛, 高强. 基于通道注意力和特征增强的交通标志检测[J]. 华南理工大学学报(自然科学版), 2023, 51(12): 64-72.
[10]	杨旭锋, 刘泽清, 张懿. 基于贝叶斯神经网络的金属材料P-S-N曲线估计[J]. 华南理工大学学报(自然科学版), 2023, 51(11): 82-92.
[11]	许伦辉, 余佳芯, 裴明阳, 等. 基于几何路网结构和强化学习的车辆重定位策略[J]. 华南理工大学学报(自然科学版), 2023, 51(10): 99-109.
[12]	李潇, 汪涛, 张毅, 等. 网联车混行条件下交通流量融合方法[J]. 华南理工大学学报(自然科学版), 2022, 50(5): 49-55.
[13]	兰凤崇, 张越, 陈吉清, 等. 人车碰撞事故中行人伤亡风险的关联性分析与预测 [J]. 华南理工大学学报(自然科学版), 2022, 50(5): 1-10.
[14]	杨春玲杨雅静. 基于多尺度特征逐层融合深度神经网络的无参考图像质量评价方法研究[J]. 华南理工大学学报(自然科学版), 2022, 50(4): 81-89,141.
[15]	王炜发, 张大明, 代毅, 等. 采用Q学习的软件定义网络抗毁技术分析[J]. 华南理工大学学报(自然科学版), 2022, 50(4): 65-72.

基于FPGA并行加速的脉冲神经网络在线学习硬件结构的设计与实现

Design and Implementation of Hardware Structure for Online Learning of Spiking Neural Networks Based on FPGA Parallel Acceleration

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 16

相关文章 15

编辑推荐

Metrics

本文评价