华南理工大学学报(自然科学版) ›› 2023, Vol. 51 ›› Issue (5): 95-103.doi: 10.12141/j.issn.1000-565X.220612

所属专题: 2023年电子、通信与自动控制

• 电子、通信与自动控制 • 上一篇    下一篇

HEVC帧内率失真优化预测模式的并行流水线硬件设计

林志坚 丁永强 杨秀芝 吴林煌   

  1. 福州大学 物理与信息工程学院,福建 福州 350108
  • 收稿日期:2022-09-20 出版日期:2023-05-25 发布日期:2023-01-13
  • 通信作者: 吴林煌(1984-),男,博士,副研究员,主要从事视频编码、计算机视觉研究。 E-mail:wlh173@163.com
  • 作者简介:林志坚(1984-),男,博士,副教授,主要从事视频编码、FPGA设计研究。E-mail:zlin@fzu.edu.cn
  • 基金资助:
    国家自然科学基金面上项目(61871132);福建省高等学校科技创新团队项目(500190)

Parallel Pipeline Hardware Design of Intra Rate-Distortion Optimization Prediction Mode in HEVC

LIN Zhijian DING Yongqiang YANG Xiuzhi WU Linhuang   

  1. College of Physics and Information Engineering,Fuzhou University,Fuzhou 350108,Fujian,China
  • Received:2022-09-20 Online:2023-05-25 Published:2023-01-13
  • Contact: 吴林煌(1984-),男,博士,副研究员,主要从事视频编码、计算机视觉研究。 E-mail:wlh173@163.com
  • About author:林志坚(1984-),男,博士,副教授,主要从事视频编码、FPGA设计研究。E-mail:zlin@fzu.edu.cn
  • Supported by:
    the General Program of the National Natural Science Foundation of China(61871132)

摘要:

近年来,随着人们对视频数据需求的不断增加,视频的分辨率和帧率也在不断地提高,而实时视频序列的压缩编码速度往往受到帧率和分辨率的影响,分辨率和帧率越大,编码所需要的时间越长。为了实现更高分辨率和更高帧率的视频序列实时压缩编码,文中设计了一种新的帧内率失真优化预测模式的并行流水线硬件架构,该架构支持最大64×64编码树单元的帧内预测编码。首先设计了9路预测模式并行方案;然后,按照Z型扫描顺序实现以4×4块为基本处理单元的流水线硬件架构,并复用32×32预测单元的预测数据,用以代替64×64预测单元的预测数据,减少运算量;最后,基于该流水线架构,提出了一种新的哈达玛变换电路,用以实现高效的流水线处理。实验结果表明:在Altera Arria 10系列的现场可编程门阵列上,该9路模式并行架构仅占用75 kb的查找表和55 kb的寄存器资源,主频可以达到207 MHz,完成一个64×64编码树单元的预测仅需要4 096个时钟周期,最大能够支持1 080 P分辨率99 f/s全I帧的实时编码;与已有设计方案相比,文中方案能够用更小的电路面积实现更高帧率的1 080 P实时视频编码。

关键词: 帧内预测, 现场可编程门阵列, 模式并行, 高效视频编码

Abstract:

In recent years, the resolution and frame rate of video have been continuously improved to meet people’s increasing demand for video data. However, the compression encoding speed of real-time video sequence is often restricted by frame rate and resolution. The higher the frame rate and resolution are, the longer the encoding time will be. In order to achieve real-time compression encode for video sequences with higher resolution and frame rate, this paper designed a new parallel pipeline hardware architecture of intra rate-distortion optimization prediction mode, which supports intra prediction coding of up to 64×64 coding tree unit. Firstly, a parallel scheme with 9-way prediction mode was designed. Secondly, a pipeline hardware architecture was implemented based on a 4×4 block as the basic processing unit in a Z-shaped scanning order, and the prediction data of 32×32 prediction units were reused to replace the prediction data of 64×64 prediction units so as to reduce the amount of calculation. Lastly, a new Hadamard transform circuit was proposed based on this pipelined architecture for efficient pipelined processing. The experimental results show that: on the Altera Arria 10 series field programmable gate array, the 9-way mode parallel architecture only occupies 75 kb look up table and 55 kb register resources, the main frequency can reach 207 MHz, and it only takes 4 096 clocks cycles to complete a 64×64 coding tree unit prediction and can support real-time encoding of 1 080 P resolution 99 f/s full I-frame at most. Compared with the existing design scheme, the scheme designed in this paper can realize higher frame rate 1 080 P real time video encoding with smaller circuit area.

Key words: intra prediction, field programmable gate array, mode in parallel, high efficiency video coding

中图分类号: