华南理工大学学报(自然科学版) ›› 2022, Vol. 50 ›› Issue (10): 51-61.doi: 10.12141/j.issn.1000-565X.220221

所属专题: 2022年电子、通信与自动控制

• 电子、通信与自动控制 • 上一篇    下一篇

深层特征域运动估计和多层多假设运动补偿的视频压缩编解码网络

杨春玲 吕泽宇   

  1. 华南理工大学 电子与信息学院,广东 广州 510640
  • 收稿日期:2022-04-20 出版日期:2022-10-25 发布日期:2022-10-10
  • 通信作者: 杨春玲(1970-),女,教授,主要从事图像/视频压缩编码、图像质量评价、图像/视频压缩感知重构研究。 E-mail:eeclyang@scut.edu.cn
  • 作者简介:杨春玲(1970-),女,教授,主要从事图像/视频压缩编码、图像质量评价、图像/视频压缩感知重构研究。
  • 基金资助:
    广东省自然科学基金资助项目(2017A030311028)

Deep Feature Domain Motion Estimation and Multi-Layer Multi-Hypothesis Motion Compensation Net for Video Compression Codec

YANG Chunling LÜ Zeyu    

  1. School of Electronic and Information Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China
  • Received:2022-04-20 Online:2022-10-25 Published:2022-10-10
  • Contact: 杨春玲(1970-),女,教授,主要从事图像/视频压缩编码、图像质量评价、图像/视频压缩感知重构研究。 E-mail:eeclyang@scut.edu.cn
  • About author:杨春玲(1970-),女,教授,主要从事图像/视频压缩编码、图像质量评价、图像/视频压缩感知重构研究。
  • Supported by:
    the Natural Science Foundation of Guangdong Province(2017A030311028)

摘要:

传统视频压缩编码方法被广泛使用,为了进一步提高压缩性能,基于深度学习的视频压缩编码方法的研究日益受到关注。现有深度学习的视频压缩编码方法基于光流实现运动补偿,在光流对齐过程中会产生伪影现象,降低了预测的准确性。文中提出了深层特征域的运动估计思路,设计了相应的神经网络在深层特征域提取运动信息。在此基础上,提出了多层多假设预测的运动补偿网络,通过在深层特征域、浅层特征域以及像素域3个层次使用多假设预测模块,提升运动补偿的准确性,提高整体的率失真性能。仿真结果表明,文中算法帧间预测结果减缓了伪影现象,视觉效果明显优于光流对齐。同时,文中算法与传统H.264、H.265方法和基于深度学习的单帧参考方法DVC、DVCpro相比,在高比特率和低比特率的情况下均取得了较好的率失真性能。与研究前沿的DCVC方法相比,在率失真性能相近的同时,文中算法减少了约26.8%的编码时间。以H.264编码结果为基准,于相同比特率条件下,文中算法在HEVC测试序列ClassB、ClassD、ClassE上的解码质量分别提升3.73、4.76、2.65 dB。由仿真实验结果可知,文中算法对视频序列进行压缩编码时,提高了运动补偿预测帧的准确度,降低了预测误差,缩短了残差信号压缩编码码流,提升了整体的率失真性能。

关键词: 视频压缩, 深度学习, 运动估计, 多假设预测, 编解码网络

Abstract:

Traditional video compression coding methods are widely used. In order to further improve the compression performance, research on deep learning-based video compression coding methods has received increasing attention. Existing deep learning video compression coding methods realize motion compensation based on optical flow, which will produce artifacts during the optical flow alignment process, reducing the accuracy of prediction. This paper proposed a motion estimation idea in the deep feature domain, and designed a corresponding neural network to extract motion information in the deep feature domain. On this basis, it proposed a multi-layer multi-hypothesis prediction motion compensation network. By using the multi-hypothesis prediction module in the deep feature domain, the shallow feature domain and the pixel domain, the accuracy of motion compensation was improved, thereby improving the overall rate-distortion performance. Simulation results show that the inter-frame prediction results of the algorithm in the paper mitigate artifacts and the visual effect is significantly better than optical flow alignment. At the same time, the proposed algorithm achieves better rate-distortion performance compared with traditional H.264 and H.265 methods and single-frame reference methods DVC and DVCpro based on deep learning. Compared with the DCVC method at the forefront of research, the algorithm reduces the coding time by approximately 26.8% while the rate distortion performance is similar. Taking the H.264 encoding result as the benchmark, under the condition of the same bit rate, the decoding quality was improved by 3.73 dB, 4.76 dB and 2.65 dB on HEVC test sequences ClassB, ClassD and ClassE. The simulation experiment results show that, when compressing and coding video sequences, the algorithm proposed in the paper can improve the accuracy of motion compensation prediction frames, reduce the prediction error, shortens the residual signal compression coding code stream and improve the overall rate distortion performance.

Key words: video compression, deep learning, motion estimation, multi-hypothesis prediction, codec network

中图分类号: