共轭梯度法在 GPU 及 Xeon Phi 下的并行优化及比较

doi:10.3969/j.issn.1000-565X.2015.11.006

华南理工大学学报（自然科学版） ›› 2015, Vol. 43 ›› Issue (11): 35-46,53.doi: 10.3969/j.issn.1000-565X.2015.11.006

共轭梯度法在 GPU 及 Xeon Phi 下的并行优化及比较

黄敏¹,丁萍^1,2,罗海飚²

1．华南理工大学软件学院，广东广州 510006; 2．广州中国科学院软件应用技术研究所智能视频实验室，广东广州 511458

收稿日期:2015-03-10 修回日期:2015-06-07 出版日期:2015-11-25 发布日期:2015-10-01
通信作者: 黄敏( 1976-) ，女，博士，副教授，主要从事并行计算和移动云计算研究 E-mail:minh@scut.edu.cn
作者简介:黄敏( 1976-) ，女，博士，副教授，主要从事并行计算和移动云计算研究
基金资助:
广东省公益研究与能力建设专项(2014A040401018);广东省促进科技服务业发展计划项目(2013B040404009);
广东省新媒体与品牌传播创新应用重点实验室资助项目(2013WSYS0002)

A Transition-Based Word Segmentation Model on Microblog with Text Normalization

Huang Min¹ Ding Ping^1,2 Luo Hai-biao²

1． School of Software Engineering，South China University of Technology，Guangzhou 510006, Guangdong，China;2．Research Center of Parallel Software Ｒesearch Center，Institute of Software Application Technology,Guangzhou ＆ CAS，Guangzhou 511458，Guangdong，China

Received:2015-03-10 Revised:2015-06-07 Online:2015-11-25 Published:2015-10-01
Contact: 黄敏( 1976-) ，女，博士，副教授，主要从事并行计算和移动云计算研究 E-mail:minh@scut.edu.cn
About author:黄敏( 1976-) ，女，博士，副教授，主要从事并行计算和移动云计算研究
Supported by:
广东省公益研究与能力建设专项(2014A040401018);广东省促进科技服务业发展计划项目(2013B040404009);
广东省新媒体与品牌传播创新应用重点实验室资助项目(2013WSYS0002)

摘要/Abstract

摘要： 为了充分利用多核处理器的强大计算能力并满足具有高并行度应用的需求，提出一种基于大规模稀疏矩阵特征问题求解的并行共轭梯度算法．对图形处理器(GPU)上的计算，有效利用 GPU 多层次的存储器体系，采用线程与矩阵映射、数据合并访问、数据复用等优化手段，并通过高效的线程调度来隐藏全局存储器的高延迟访问;对 Xeon Phi处理器上的计算，有效利用 Xeon Phi 的高并行度计算对数据通信/传递、减少数据依赖、向量化、异步计算等进行优化，并通过高效的线程调度来隐藏全局存储器的高延迟访问．文中还通过实验验证了算法的可行性和正确性，并对比了不同方式下的运行效率，发现共轭梯度法在 GPU 下比在 Xeon Phi 下的加速效果更好．

关键词: 共轭梯度法, 图形处理器, Xeon Phi, 并行优化, 稀疏矩阵向量乘

Abstract: In order to harness the strong horsepower of multi-core processors and meet the demand of high parallelism，a new parallel conjugate gradient algorithm is proposed，which focuses on solving the linear equations of large-scale sparse matrices． For the GPU coprocessors，the memory hierarchy of GPU is effectively utilized，optimization methods，such as thread and matrix mappings，data merging and data multiplexing，are adopted，and an effective thread scheduling is conducted to hide the high latency of accessing the global memory of GPU． For Xeon Phi processors，the computation of high parallelism is effectively utilized to optimize data communication and transmission，data dependence reduction，vectorization and asynchronous computation，and effective thread scheduling is also conducted to hide the high latency of accessing global memory of GPU． Finally，the proposed algorithm is proved to be feasible and correct by tests on GPU and Xeon Phi，and its parallel efficiencies in two different ways are compared． It is found that the proposed algorithm on GPU has a better acceleration effect than itself on Xeon Phi．

Key words: conjugate gradient method, graphics processing unit, Xeon Phi, parallel optimization, sparse matrix-vector multiplication

黄敏丁萍罗海飚. 共轭梯度法在 GPU 及 Xeon Phi 下的并行优化及比较[J]. 华南理工大学学报（自然科学版）, 2015, 43(11): 35-46,53.

Qian Tao Ji Dong-hong Dai Wen-hua. A Transition-Based Word Segmentation Model on Microblog with Text Normalization[J]. Journal of South China University of Technology (Natural Science Edition), 2015, 43(11): 35-46,53.

[1]	刘勇奚建清黄东平贾连印苗德成. 图形处理器上CSB⁺- 树索引的并行构建算法[J]. 华南理工大学学报（自然科学版）, 2014, 42(1): 123-127,134.
[2]	刘勇奚建清黄东平贾连印苗德成. 图形处理器上内存数据库索引T- 树的研究[J]. 华南理工大学学报（自然科学版）, 2013, 41(3): 22-28.
[3]	赵相坤李凤霞战守义. 基于GPU 的面向SPH 流体模拟的邻居查找算法[J]. 华南理工大学学报（自然科学版）, 2011, 39(7): 150-155.
[4]	刘明波黄义隆林舜江. 基于轨迹灵敏度技术的PSS和SVC协调优化设计[J]. 华南理工大学学报（自然科学版）, 2011, 39(3): 52-57,72.

共轭梯度法在 GPU 及 Xeon Phi 下的并行优化及比较

A Transition-Based Word Segmentation Model on Microblog with Text Normalization

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 4

编辑推荐

Metrics

本文评价