深层特征域运动估计和多层多假设运动补偿的视频压缩编解码网络

杨春玲, 吕泽宇

doi:10.12141/j.issn.1000-565X.220221

华南理工大学学报(自然科学版) >

2022 , Vol. 50 >Issue 10: 51 - 61

DOI: https://doi.org/10.12141/j.issn.1000-565X.220221

电子、通信与自动控制

深层特征域运动估计和多层多假设运动补偿的视频压缩编解码网络

展开

华南理工大学电子与信息学院，广东广州 510640

杨春玲（1970-），女，教授，主要从事图像/视频压缩编码、图像质量评价、图像/视频压缩感知重构研究。

收稿日期: 2022-04-20

网络出版日期: 2022-10-10

基金资助

广东省自然科学基金资助项目(2017A030311028)

收起

Deep Feature Domain Motion Estimation and Multi-Layer Multi-Hypothesis Motion Compensation Net for Video Compression Codec

Expand

School of Electronic and Information Engineering，South China University of Technology，Guangzhou 510640，Guangdong，China

杨春玲（1970-），女，教授，主要从事图像/视频压缩编码、图像质量评价、图像/视频压缩感知重构研究。

Received date: 2022-04-20

Online published: 2022-10-10

Supported by

the Natural Science Foundation of Guangdong Province(2017A030311028)

Fold

摘要

传统视频压缩编码方法被广泛使用，为了进一步提高压缩性能，基于深度学习的视频压缩编码方法的研究日益受到关注。现有深度学习的视频压缩编码方法基于光流实现运动补偿，在光流对齐过程中会产生伪影现象，降低了预测的准确性。文中提出了深层特征域的运动估计思路，设计了相应的神经网络在深层特征域提取运动信息。在此基础上，提出了多层多假设预测的运动补偿网络，通过在深层特征域、浅层特征域以及像素域3个层次使用多假设预测模块，提升运动补偿的准确性，提高整体的率失真性能。仿真结果表明，文中算法帧间预测结果减缓了伪影现象，视觉效果明显优于光流对齐。同时，文中算法与传统H.264、H.265方法和基于深度学习的单帧参考方法DVC、DVCpro相比，在高比特率和低比特率的情况下均取得了较好的率失真性能。与研究前沿的DCVC方法相比，在率失真性能相近的同时，文中算法减少了约26.8%的编码时间。以H.264编码结果为基准，于相同比特率条件下，文中算法在HEVC测试序列ClassB、ClassD、ClassE上的解码质量分别提升3.73、4.76、2.65 dB。由仿真实验结果可知，文中算法对视频序列进行压缩编码时，提高了运动补偿预测帧的准确度，降低了预测误差，缩短了残差信号压缩编码码流，提升了整体的率失真性能。

关键词： 视频压缩; 深度学习; 运动估计; 多假设预测; 编解码网络

本文引用格式

杨春玲, 吕泽宇 . 深层特征域运动估计和多层多假设运动补偿的视频压缩编解码网络[J]. 华南理工大学学报(自然科学版), 2022 , 50(10) : 51 -61 . DOI: 10.12141/j.issn.1000-565X.220221

Abstract

Traditional video compression coding methods are widely used. In order to further improve the compression performance, research on deep learning-based video compression coding methods has received increasing attention. Existing deep learning video compression coding methods realize motion compensation based on optical flow, which will produce artifacts during the optical flow alignment process, reducing the accuracy of prediction. This paper proposed a motion estimation idea in the deep feature domain, and designed a corresponding neural network to extract motion information in the deep feature domain. On this basis, it proposed a multi-layer multi-hypothesis prediction motion compensation network. By using the multi-hypothesis prediction module in the deep feature domain, the shallow feature domain and the pixel domain, the accuracy of motion compensation was improved, thereby improving the overall rate-distortion performance. Simulation results show that the inter-frame prediction results of the algorithm in the paper mitigate artifacts and the visual effect is significantly better than optical flow alignment. At the same time, the proposed algorithm achieves better rate-distortion performance compared with traditional H.264 and H.265 methods and single-frame reference methods DVC and DVCpro based on deep learning. Compared with the DCVC method at the forefront of research, the algorithm reduces the coding time by approximately 26.8% while the rate distortion performance is similar. Taking the H.264 encoding result as the benchmark, under the condition of the same bit rate, the decoding quality was improved by 3.73 dB, 4.76 dB and 2.65 dB on HEVC test sequences ClassB, ClassD and ClassE. The simulation experiment results show that, when compressing and coding video sequences, the algorithm proposed in the paper can improve the accuracy of motion compensation prediction frames, reduce the prediction error, shortens the residual signal compression coding code stream and improve the overall rate distortion performance.

Key words： video compression; deep learning; motion estimation; multi-hypothesis prediction; codec network

参考文献

1	WIEGAND T， SULLIVAN G J， BJONTEGAARD G，et al ．Overview of the H.264/AVC video coding standard ［J］．IEEE Transactions on Circuits and Systems for Video Technology，2003，13（7）：560-576.
2	SULLIVAN G J， OHM J R， HAN W J，et al ．Overview of the high efficiency video coding （HEVC） standard ［J］．IEEE Transactions on Circuits and Systems for Video Technology，2012，22（12）：1649-1668.
3	LU G， OUYANG W， XU D，et al ．DVC：An end-to-end deep video compression framework ［C］ ∥ Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition．Long Beach：IEEE，2019：11006-11015.
4	LU G， ZHANG X， OUYANG W，et al ．An end-to-end learning framework for video compression ［J］．IEEE Transactions on Pattern Analysis and Machine Intelligence，2021，43（10）：3292-3308.
5	YANG X， YANG C ．ImrNet：An iterative motion compensation and residual reconstruction network for video compressed sensing ［C］ ∥ Proceedings of ICASSP 2021-2021 IEEE International Conference on Acoustics，Speech and Signal Processing. Toronto：IEEE，2021：2350-2354.
6	WEI Z， YANG C， XUAN Y ．Efficient video compressed sensing reconstruction via exploiting spatial-temporal correlation with measurement constraint ［C］ ∥ Proceedings of 2021 IEEE International Conference on Multimedia and Expo．Shenzhen：IEEE，2021：1-6.
7	禤韵怡，杨春玲．基于帧间组稀疏的两阶段递归增强视频压缩感知重构网络［J］．电子学报，2021，49（3）：435-442.
7	XUAN Yunyi， YANG Chunling ．Two-stage recursive enhancement reconstruction based on video inter-frame group sparse representation in compressed video sensing ［J］．Acta Electronica Sinica，2021，49（3）：435-442.
8	HU Z， CHEN Z， XU D，et al ．Improving deep video compression by resolution-adaptive flow coding ［C］ ∥Proceedings of European Conference on Computer Vision．Edinburgh ：Springer，2020：193-209.
9	LU G， CAI C， ZHANG X，et al ．Content adaptive and error propagation aware deep video compression ［C］ ∥ Proceedings of European Conference on Computer Vision. Edinburgh：Springer，2020：456-472.
10	LIN J， LIU D， LI H，et al ．M-LVC：Multiple frames prediction for learned video compression ［C］ ∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition．Seattle：IEEE，2020：3546-3554.
11	ALEXANDRE D， HANG H M， PENG W H，et al ．Deep video compression for interframe coding ［C］ ∥ Proceedings of 2021 IEEE International Conference on Image Processing. Anchorage：IEEE，2021：2124-2128.
12	YANG R， MENTZER F， VAN GOOL L，et al ．Learning for video compression with recurrent auto-encoder and recurrent probability model ［J］．IEEE Journal of Selected Topics in Signal Processing，2020，15（2）：388-401.
13	SHI X， CHEN Z， WANG H，et al ．Convolutional LSTM network：A machine learning approach for precipitation nowcasting ［C］ ∥ Proceedings of Advances in Neural Information Processing Systems．Montreal：MIT Press，2015：28-44.
14	LI J， LI B， LU Y ．Deep contextual video compression ［C］ ∥ Proceedings of Advances in Neural Information Processing Systems．［S.l.］：MIT Press，2021：18114-18125.
15	DAI J， QI H， XIONG Y，et al ．Deformable convolutional networks ［C］ ∥ Proceedings of the IEEE International Conference on Computer Vision．Venice：IEEE，2017：764-773.
16	WANG X， CHAN K C K， YU K，et al ．EDVR：Video restoration with enhanced deformable convolutional networks ［C］ ∥ Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops．Long Beach：IEEE，2019：1954-1963.
17	HU Z， LU G， XU D ．FVC：A new framework towards deep video compression in feature space ［C］ ∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition．Virtual：IEEE，2021：1502-1511.
18	LING X， YANG C， PEI H ．Compressed video sensing network based on alignment prediction and residual reconstruction ［C］ ∥ Proceedings of 2020 IEEE International Conference on Multimedia and Expo．London：IEEE，2020：1-6.
19	MINNEN D， BALLé J， TODERICI G D ．Joint autoregressive and hierarchical priors for learned image compression ［C］ ∥ Proceedings of Advances in Neural Information Processing Systems．Montreal：MIT Press，2018：10771-10780.
20	BALLé J， MINNEN D， SINGH S，et al ．Variational image compression with a scale hyperprior ［C］ ∥ Proceedings of International Conference on Learning Representations．Vancouver：［s.n.］，2018：1-23.
21	CHENG Z， SUN H， TAKEUCHI M，et al ．Learned image compression with discretized gaussian mixture likelihoods and attention modules ［C］∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle：IEEE，2020：7939-7948.
22	XUE T， CHEN B， WU J，et al ．Video enhancement with task-oriented flow ［J］．International Journal of Computer Vision，2019，127（8）：1106-1125.
23	MERCAT A， VIITANEN M， VANNE J ．UVG dataset：50/120fps 4K sequences for video codec analysis and development ［C］ ∥Proceedings of the 11th ACM Multimedia Systems Conference．Istanbul：Association for Computing Machinery，2020：297-302.
24	BJONTEGAARD G. Calculation of average PSNR differences between RD-curves：VCEG-M33 ［Z］．Austin：ITU-T，2001.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献