华南理工大学学报(自然科学版) ›› 2012, Vol. 40 ›› Issue (6): 97-102.

• 计算机科学与技术 • 上一篇    下一篇

AES快速算法的扩展指令集实现

封斌 齐德昱   

  1. 华南理工大学 计算机系统研究所,广东 广州 510640
  • 收稿日期:2011-12-27 修回日期:2012-03-26 出版日期:2012-06-25 发布日期:2012-05-03
  • 通信作者: 封斌(1974-) ,男,博士生,高级工程师,主要从事嵌入式系统、高性能计算等的研究. E-mail:billfeng126@126.com
  • 作者简介:封斌(1974-) ,男,博士生,高级工程师,主要从事嵌入式系统、高性能计算等的研究.
  • 基金资助:

    国家自然科学基金资助项目( 61070015) ; 广东省自然科学基金团队项目( 10351806001000000)

Implementation of Extended Instruction Set for AES Fast Algorithm

Feng Bin  Qi De-yu   

  1. Research Institute of Computer Systems,South China University of Technology,Guangzhou 510640,Guangdong,China
  • Received:2011-12-27 Revised:2012-03-26 Online:2012-06-25 Published:2012-05-03
  • Contact: 封斌(1974-) ,男,博士生,高级工程师,主要从事嵌入式系统、高性能计算等的研究. E-mail:billfeng126@126.com
  • About author:封斌(1974-) ,男,博士生,高级工程师,主要从事嵌入式系统、高性能计算等的研究.
  • Supported by:

    国家自然科学基金资助项目( 61070015) ; 广东省自然科学基金团队项目( 10351806001000000)

摘要: 基于Daemen 等提出的AES 快速算法,给出了用可配置处理器NiosII 扩展指令集实现硬件加速的两种方案——基于片内存储器存储快速算法查找表的方法、用硬件逻辑电路实现S 盒并计算出快速算法查找表对应元素的方法,用对前向查找表的查表操作代替了AES 算法计算密集的轮变换操作.首先,将快速算法的前向查找表存放在片上内存中,并用12 条扩展指令分别完成密钥扩展、轮变换和末轮操作,末轮变换所需的S 盒采取对前向查找表的掩模得到; 然后,对该方案进行优化以消除片上内存的占用,即推导出S盒与前向查找表的逻辑关系,并采取有限元素求逆的方法用逻辑电路实现S 盒,增强了系统安全性并降低了功耗; 最后,对扩展指令集和协处理器等多种实现方案进行了测试及性能对比.结果表明,相比于经过结构优化的纯软件快速AES 算法,文中提出的方案在仅增加223 个LE 的条件下,达到了2. 47 倍的加速比.

关键词: AES 快速算法, 扩展指令集, S 盒, 有限域, NiosII 处理器, 加速比

Abstract:

Based on Daemen’s AES fast algorithm,two schemes of implementing the instruction set extension on configurable processor NiosII are proposed to achieve hardware acceleration. These two schemes,one of which stores lookup table in on-chip memory and the other uses a logic circuit to realize the S-box and calculates the corresponding elements of the lookup table,employ a forward lookup table to replace the intensive round transformation
operation. Specifically,the forward lookup table of the fast algorithm is placed in on-chip memory,12 new extended instructions are created to achieve the key expansion,the round transformation and the last round transformation,and the S-box used in the last round is obtained by masking the forward look-up table. In order to eliminate the on-chip memory usage,the schemes are then optimized by deriving the logical relationship between the S-box and the forward lookup table and by employing a logic circuit to realize the S-box via the inverse finite element method. Thus,the system security is enhanced and the power consumption is reduced. Finally,the performances of the extended instruction sets,the coprocessor and some other schemes are tested and compared. The results show that,as compared with the pure software solution of AES fast algorithm with optimized structure,the proposed schemes increase the speedup by 247% only with 223 additional LEs.

Key words: AES fast algorithm, extended instruction set, S-box, finite field, NiosII processor, speedup

中图分类号: