深层特征域运动估计和多层多假设运动补偿的视频压缩编解码网络

doi:10.12141/j.issn.1000-565X.220221

华南理工大学学报(自然科学版) ›› 2022, Vol. 50 ›› Issue (10): 51-61.doi: 10.12141/j.issn.1000-565X.220221

所属专题： 2022年电子、通信与自动控制

• 电子、通信与自动控制 • 上一篇下一篇

深层特征域运动估计和多层多假设运动补偿的视频压缩编解码网络

杨春玲吕泽宇

华南理工大学电子与信息学院，广东广州 510640

收稿日期:2022-04-20 出版日期:2022-10-25 发布日期:2022-10-10
通信作者: 杨春玲（1970-），女，教授，主要从事图像/视频压缩编码、图像质量评价、图像/视频压缩感知重构研究。 E-mail:eeclyang@scut.edu.cn
作者简介:杨春玲（1970-），女，教授，主要从事图像/视频压缩编码、图像质量评价、图像/视频压缩感知重构研究。
基金资助:
广东省自然科学基金资助项目(2017A030311028)

Deep Feature Domain Motion Estimation and Multi-Layer Multi-Hypothesis Motion Compensation Net for Video Compression Codec

YANG Chunling LÜ Zeyu

School of Electronic and Information Engineering，South China University of Technology，Guangzhou 510640，Guangdong，China

Received:2022-04-20 Online:2022-10-25 Published:2022-10-10
Contact: 杨春玲（1970-），女，教授，主要从事图像/视频压缩编码、图像质量评价、图像/视频压缩感知重构研究。 E-mail:eeclyang@scut.edu.cn
About author:杨春玲（1970-），女，教授，主要从事图像/视频压缩编码、图像质量评价、图像/视频压缩感知重构研究。
Supported by:
the Natural Science Foundation of Guangdong Province(2017A030311028)

摘要/Abstract

摘要：

传统视频压缩编码方法被广泛使用，为了进一步提高压缩性能，基于深度学习的视频压缩编码方法的研究日益受到关注。现有深度学习的视频压缩编码方法基于光流实现运动补偿，在光流对齐过程中会产生伪影现象，降低了预测的准确性。文中提出了深层特征域的运动估计思路，设计了相应的神经网络在深层特征域提取运动信息。在此基础上，提出了多层多假设预测的运动补偿网络，通过在深层特征域、浅层特征域以及像素域3个层次使用多假设预测模块，提升运动补偿的准确性，提高整体的率失真性能。仿真结果表明，文中算法帧间预测结果减缓了伪影现象，视觉效果明显优于光流对齐。同时，文中算法与传统H.264、H.265方法和基于深度学习的单帧参考方法DVC、DVCpro相比，在高比特率和低比特率的情况下均取得了较好的率失真性能。与研究前沿的DCVC方法相比，在率失真性能相近的同时，文中算法减少了约26.8%的编码时间。以H.264编码结果为基准，于相同比特率条件下，文中算法在HEVC测试序列ClassB、ClassD、ClassE上的解码质量分别提升3.73、4.76、2.65 dB。由仿真实验结果可知，文中算法对视频序列进行压缩编码时，提高了运动补偿预测帧的准确度，降低了预测误差，缩短了残差信号压缩编码码流，提升了整体的率失真性能。

关键词: 视频压缩, 深度学习, 运动估计, 多假设预测, 编解码网络

Abstract:

Traditional video compression coding methods are widely used. In order to further improve the compression performance, research on deep learning-based video compression coding methods has received increasing attention. Existing deep learning video compression coding methods realize motion compensation based on optical flow, which will produce artifacts during the optical flow alignment process, reducing the accuracy of prediction. This paper proposed a motion estimation idea in the deep feature domain, and designed a corresponding neural network to extract motion information in the deep feature domain. On this basis, it proposed a multi-layer multi-hypothesis prediction motion compensation network. By using the multi-hypothesis prediction module in the deep feature domain, the shallow feature domain and the pixel domain, the accuracy of motion compensation was improved, thereby improving the overall rate-distortion performance. Simulation results show that the inter-frame prediction results of the algorithm in the paper mitigate artifacts and the visual effect is significantly better than optical flow alignment. At the same time, the proposed algorithm achieves better rate-distortion performance compared with traditional H.264 and H.265 methods and single-frame reference methods DVC and DVCpro based on deep learning. Compared with the DCVC method at the forefront of research, the algorithm reduces the coding time by approximately 26.8% while the rate distortion performance is similar. Taking the H.264 encoding result as the benchmark, under the condition of the same bit rate, the decoding quality was improved by 3.73 dB, 4.76 dB and 2.65 dB on HEVC test sequences ClassB, ClassD and ClassE. The simulation experiment results show that, when compressing and coding video sequences, the algorithm proposed in the paper can improve the accuracy of motion compensation prediction frames, reduce the prediction error, shortens the residual signal compression coding code stream and improve the overall rate distortion performance.

Key words: video compression, deep learning, motion estimation, multi-hypothesis prediction, codec network

中图分类号:

TN919.8

杨春玲, 吕泽宇. 深层特征域运动估计和多层多假设运动补偿的视频压缩编解码网络[J]. 华南理工大学学报(自然科学版), 2022, 50(10): 51-61.

YANG Chunling, LÜ Zeyu . Deep Feature Domain Motion Estimation and Multi-Layer Multi-Hypothesis Motion Compensation Net for Video Compression Codec[J]. Journal of South China University of Technology(Natural Science Edition), 2022, 50(10): 51-61.

参考文献 24

1	WIEGAND T， SULLIVAN G J， BJONTEGAARD G，et al ．Overview of the H.264/AVC video coding standard ［J］．IEEE Transactions on Circuits and Systems for Video Technology，2003，13（7）：560-576.
2	SULLIVAN G J， OHM J R， HAN W J，et al ．Overview of the high efficiency video coding （HEVC） standard ［J］．IEEE Transactions on Circuits and Systems for Video Technology，2012，22（12）：1649-1668.
3	LU G， OUYANG W， XU D，et al ．DVC：An end-to-end deep video compression framework ［C］ ∥ Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition．Long Beach：IEEE，2019：11006-11015.
4	LU G， ZHANG X， OUYANG W，et al ．An end-to-end learning framework for video compression ［J］．IEEE Transactions on Pattern Analysis and Machine Intelligence，2021，43（10）：3292-3308.
5	YANG X， YANG C ．ImrNet：An iterative motion compensation and residual reconstruction network for video compressed sensing ［C］ ∥ Proceedings of ICASSP 2021-2021 IEEE International Conference on Acoustics，Speech and Signal Processing. Toronto：IEEE，2021：2350-2354.
6	WEI Z， YANG C， XUAN Y ．Efficient video compressed sensing reconstruction via exploiting spatial-temporal correlation with measurement constraint ［C］ ∥ Proceedings of 2021 IEEE International Conference on Multimedia and Expo．Shenzhen：IEEE，2021：1-6.
7	禤韵怡，杨春玲．基于帧间组稀疏的两阶段递归增强视频压缩感知重构网络［J］．电子学报，2021，49（3）：435-442.
	XUAN Yunyi， YANG Chunling ．Two-stage recursive enhancement reconstruction based on video inter-frame group sparse representation in compressed video sensing ［J］．Acta Electronica Sinica，2021，49（3）：435-442.
8	HU Z， CHEN Z， XU D，et al ．Improving deep video compression by resolution-adaptive flow coding ［C］ ∥Proceedings of European Conference on Computer Vision．Edinburgh ：Springer，2020：193-209.
9	LU G， CAI C， ZHANG X，et al ．Content adaptive and error propagation aware deep video compression ［C］ ∥ Proceedings of European Conference on Computer Vision. Edinburgh：Springer，2020：456-472.
10	LIN J， LIU D， LI H，et al ．M-LVC：Multiple frames prediction for learned video compression ［C］ ∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition．Seattle：IEEE，2020：3546-3554.
11	ALEXANDRE D， HANG H M， PENG W H，et al ．Deep video compression for interframe coding ［C］ ∥ Proceedings of 2021 IEEE International Conference on Image Processing. Anchorage：IEEE，2021：2124-2128.
12	YANG R， MENTZER F， VAN GOOL L，et al ．Learning for video compression with recurrent auto-encoder and recurrent probability model ［J］．IEEE Journal of Selected Topics in Signal Processing，2020，15（2）：388-401.
13	SHI X， CHEN Z， WANG H，et al ．Convolutional LSTM network：A machine learning approach for precipitation nowcasting ［C］ ∥ Proceedings of Advances in Neural Information Processing Systems．Montreal：MIT Press，2015：28-44.
14	LI J， LI B， LU Y ．Deep contextual video compression ［C］ ∥ Proceedings of Advances in Neural Information Processing Systems．［S.l.］：MIT Press，2021：18114-18125.
15	DAI J， QI H， XIONG Y，et al ．Deformable convolutional networks ［C］ ∥ Proceedings of the IEEE International Conference on Computer Vision．Venice：IEEE，2017：764-773.
16	WANG X， CHAN K C K， YU K，et al ．EDVR：Video restoration with enhanced deformable convolutional networks ［C］ ∥ Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops．Long Beach：IEEE，2019：1954-1963.
17	HU Z， LU G， XU D ．FVC：A new framework towards deep video compression in feature space ［C］ ∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition．Virtual：IEEE，2021：1502-1511.
18	LING X， YANG C， PEI H ．Compressed video sensing network based on alignment prediction and residual reconstruction ［C］ ∥ Proceedings of 2020 IEEE International Conference on Multimedia and Expo．London：IEEE，2020：1-6.
19	MINNEN D， BALLÉ J， TODERICI G D ．Joint autoregressive and hierarchical priors for learned image compression ［C］ ∥ Proceedings of Advances in Neural Information Processing Systems．Montreal：MIT Press，2018：10771-10780.
20	BALLÉ J， MINNEN D， SINGH S，et al ．Variational image compression with a scale hyperprior ［C］ ∥ Proceedings of International Conference on Learning Representations．Vancouver：［s.n.］，2018：1-23.
21	CHENG Z， SUN H， TAKEUCHI M，et al ．Learned image compression with discretized gaussian mixture likelihoods and attention modules ［C］∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle：IEEE，2020：7939-7948.
22	XUE T， CHEN B， WU J，et al ．Video enhancement with task-oriented flow ［J］．International Journal of Computer Vision，2019，127（8）：1106-1125.
23	MERCAT A， VIITANEN M， VANNE J ．UVG dataset：50/120fps 4K sequences for video codec analysis and development ［C］ ∥Proceedings of the 11th ACM Multimedia Systems Conference．Istanbul：Association for Computing Machinery，2020：297-302.
24	BJONTEGAARD G. Calculation of average PSNR differences between RD-curves：VCEG-M33 ［Z］．Austin：ITU-T，2001.

[1]	杨春玲, 梁梓文. 特征域近端高维梯度下降图像压缩感知重构网络[J]. 华南理工大学学报(自然科学版), 2024, 52(3): 119-130.
[2]	郑娟毅, 董嘉豪, 张庆珏, 等. 基于残差密集网络的智能超表面信道估计算法[J]. 华南理工大学学报(自然科学版), 2024, 52(3): 102-111.
[3]	周浪, 樊坤, 瞿华, 等. 基于ECA注意力机制改进的EfficientNet-E模型森林火灾识别研究[J]. 华南理工大学学报(自然科学版), 2024, 52(2): 42-49.
[4]	李方, 郭炜森, 张平, 等. 基于时空双细胞状态的轴承剩余使用寿命预测方法[J]. 华南理工大学学报(自然科学版), 2023, 51(9): 69-81.
[5]	苏锦钿, 余珊珊, 洪晓斌. 一种面向中文拼写纠错的自监督预训练方法[J]. 华南理工大学学报(自然科学版), 2023, 51(9): 90-98.
[6]	李家春, 李博文, 林伟伟. AdfNet：一种基于多样化特征的自适应深度伪造检测网络[J]. 华南理工大学学报(自然科学版), 2023, 51(9): 82-89.
[7]	郭恩强, 符锌砂. 基于特征相似性学习的抛洒物检测方法[J]. 华南理工大学学报(自然科学版), 2023, 51(6): 30-41.
[8]	赵建东, 焦岚馨, 赵志敏, 等. 考虑侧向车换道影响的理论和数据组合驱动的车辆跟驰模型[J]. 华南理工大学学报(自然科学版), 2023, 51(6): 10-19.
[9]	叶峰, 陈彪, 赖乙宗. 基于特征空间嵌入的对比知识蒸馏算法[J]. 华南理工大学学报(自然科学版), 2023, 51(5): 13-23.
[10]	赵荣超, 吴百礼, 陈祝云, 温楷儒, 张绍辉, 李巍华. 多尺度时空信息融合驱动的图神经网络故障诊断方法[J]. 华南理工大学学报(自然科学版), 2023, 51(12): 42-52.
[11]	侯力玮, 王恒升, 邹浩然. 基于深度学习的玻璃基板铲起过程作用力预测[J]. 华南理工大学学报(自然科学版), 2022, 50(8): 71-81.
[12]	莫建文, 朱彦桥, 袁华, 等. 基于神经元正则和资源释放的增量学习[J]. 华南理工大学学报(自然科学版), 2022, 50(6): 71-79,90.
[13]	陆璐, 钟文煜, 吴小坤. 基于多尺度视觉Transformer的图像篡改定位[J]. 华南理工大学学报(自然科学版), 2022, 50(6): 10-18.
[14]	张勤, 胡嘉辉, 任海林. 饲喂辅助机器人的智能推料方法与试验研究[J]. 华南理工大学学报(自然科学版), 2022, 50(6): 111-120.
[15]	杨春玲, 凌茜, 吕泽宇. 特征域多假设预测视频压缩感知重构神经网络[J]. 华南理工大学学报(自然科学版), 2022, 50(6): 80-90.

深层特征域运动估计和多层多假设运动补偿的视频压缩编解码网络

Deep Feature Domain Motion Estimation and Multi-Layer Multi-Hypothesis Motion Compensation Net for Video Compression Codec

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献 24

相关文章 15

编辑推荐

Metrics

本文评价