Deep Feature Domain Motion Estimation and Multi-Layer Multi-Hypothesis Motion Compensation Net for Video Compression Codec

YANG Chunling, L&Uuml; Zeyu

doi:10.12141/j.issn.1000-565X.220221

Journal of South China University of Technology(Natural Science) >

2022 , Vol. 50 >Issue 10: 51 - 61

DOI: https://doi.org/10.12141/j.issn.1000-565X.220221

Electronics, Communication & Automation Technology

Deep Feature Domain Motion Estimation and Multi-Layer Multi-Hypothesis Motion Compensation Net for Video Compression Codec

Expand

School of Electronic and Information Engineering，South China University of Technology，Guangzhou 510640，Guangdong，China

杨春玲（1970-），女，教授，主要从事图像/视频压缩编码、图像质量评价、图像/视频压缩感知重构研究。

Received date: 2022-04-20

Online published: 2022-10-10

Supported by

the Natural Science Foundation of Guangdong Province(2017A030311028)

Fold

Abstract

Traditional video compression coding methods are widely used. In order to further improve the compression performance, research on deep learning-based video compression coding methods has received increasing attention. Existing deep learning video compression coding methods realize motion compensation based on optical flow, which will produce artifacts during the optical flow alignment process, reducing the accuracy of prediction. This paper proposed a motion estimation idea in the deep feature domain, and designed a corresponding neural network to extract motion information in the deep feature domain. On this basis, it proposed a multi-layer multi-hypothesis prediction motion compensation network. By using the multi-hypothesis prediction module in the deep feature domain, the shallow feature domain and the pixel domain, the accuracy of motion compensation was improved, thereby improving the overall rate-distortion performance. Simulation results show that the inter-frame prediction results of the algorithm in the paper mitigate artifacts and the visual effect is significantly better than optical flow alignment. At the same time, the proposed algorithm achieves better rate-distortion performance compared with traditional H.264 and H.265 methods and single-frame reference methods DVC and DVCpro based on deep learning. Compared with the DCVC method at the forefront of research, the algorithm reduces the coding time by approximately 26.8% while the rate distortion performance is similar. Taking the H.264 encoding result as the benchmark, under the condition of the same bit rate, the decoding quality was improved by 3.73 dB, 4.76 dB and 2.65 dB on HEVC test sequences ClassB, ClassD and ClassE. The simulation experiment results show that, when compressing and coding video sequences, the algorithm proposed in the paper can improve the accuracy of motion compensation prediction frames, reduce the prediction error, shortens the residual signal compression coding code stream and improve the overall rate distortion performance.

Key words： video compression; deep learning; motion estimation; multi-hypothesis prediction; codec network

Cite this article

YANG Chunling, LÜ Zeyu . Deep Feature Domain Motion Estimation and Multi-Layer Multi-Hypothesis Motion Compensation Net for Video Compression Codec[J]. Journal of South China University of Technology(Natural Science), 2022 , 50(10) : 51 -61 . DOI: 10.12141/j.issn.1000-565X.220221

References

1	WIEGAND T， SULLIVAN G J， BJONTEGAARD G，et al ．Overview of the H.264/AVC video coding standard ［J］．IEEE Transactions on Circuits and Systems for Video Technology，2003，13（7）：560-576.
2	SULLIVAN G J， OHM J R， HAN W J，et al ．Overview of the high efficiency video coding （HEVC） standard ［J］．IEEE Transactions on Circuits and Systems for Video Technology，2012，22（12）：1649-1668.
3	LU G， OUYANG W， XU D，et al ．DVC：An end-to-end deep video compression framework ［C］ ∥ Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition．Long Beach：IEEE，2019：11006-11015.
4	LU G， ZHANG X， OUYANG W，et al ．An end-to-end learning framework for video compression ［J］．IEEE Transactions on Pattern Analysis and Machine Intelligence，2021，43（10）：3292-3308.
5	YANG X， YANG C ．ImrNet：An iterative motion compensation and residual reconstruction network for video compressed sensing ［C］ ∥ Proceedings of ICASSP 2021-2021 IEEE International Conference on Acoustics，Speech and Signal Processing. Toronto：IEEE，2021：2350-2354.
6	WEI Z， YANG C， XUAN Y ．Efficient video compressed sensing reconstruction via exploiting spatial-temporal correlation with measurement constraint ［C］ ∥ Proceedings of 2021 IEEE International Conference on Multimedia and Expo．Shenzhen：IEEE，2021：1-6.
7	禤韵怡，杨春玲．基于帧间组稀疏的两阶段递归增强视频压缩感知重构网络［J］．电子学报，2021，49（3）：435-442.
7	XUAN Yunyi， YANG Chunling ．Two-stage recursive enhancement reconstruction based on video inter-frame group sparse representation in compressed video sensing ［J］．Acta Electronica Sinica，2021，49（3）：435-442.
8	HU Z， CHEN Z， XU D，et al ．Improving deep video compression by resolution-adaptive flow coding ［C］ ∥Proceedings of European Conference on Computer Vision．Edinburgh ：Springer，2020：193-209.
9	LU G， CAI C， ZHANG X，et al ．Content adaptive and error propagation aware deep video compression ［C］ ∥ Proceedings of European Conference on Computer Vision. Edinburgh：Springer，2020：456-472.
10	LIN J， LIU D， LI H，et al ．M-LVC：Multiple frames prediction for learned video compression ［C］ ∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition．Seattle：IEEE，2020：3546-3554.
11	ALEXANDRE D， HANG H M， PENG W H，et al ．Deep video compression for interframe coding ［C］ ∥ Proceedings of 2021 IEEE International Conference on Image Processing. Anchorage：IEEE，2021：2124-2128.
12	YANG R， MENTZER F， VAN GOOL L，et al ．Learning for video compression with recurrent auto-encoder and recurrent probability model ［J］．IEEE Journal of Selected Topics in Signal Processing，2020，15（2）：388-401.
13	SHI X， CHEN Z， WANG H，et al ．Convolutional LSTM network：A machine learning approach for precipitation nowcasting ［C］ ∥ Proceedings of Advances in Neural Information Processing Systems．Montreal：MIT Press，2015：28-44.
14	LI J， LI B， LU Y ．Deep contextual video compression ［C］ ∥ Proceedings of Advances in Neural Information Processing Systems．［S.l.］：MIT Press，2021：18114-18125.
15	DAI J， QI H， XIONG Y，et al ．Deformable convolutional networks ［C］ ∥ Proceedings of the IEEE International Conference on Computer Vision．Venice：IEEE，2017：764-773.
16	WANG X， CHAN K C K， YU K，et al ．EDVR：Video restoration with enhanced deformable convolutional networks ［C］ ∥ Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops．Long Beach：IEEE，2019：1954-1963.
17	HU Z， LU G， XU D ．FVC：A new framework towards deep video compression in feature space ［C］ ∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition．Virtual：IEEE，2021：1502-1511.
18	LING X， YANG C， PEI H ．Compressed video sensing network based on alignment prediction and residual reconstruction ［C］ ∥ Proceedings of 2020 IEEE International Conference on Multimedia and Expo．London：IEEE，2020：1-6.
19	MINNEN D， BALLé J， TODERICI G D ．Joint autoregressive and hierarchical priors for learned image compression ［C］ ∥ Proceedings of Advances in Neural Information Processing Systems．Montreal：MIT Press，2018：10771-10780.
20	BALLé J， MINNEN D， SINGH S，et al ．Variational image compression with a scale hyperprior ［C］ ∥ Proceedings of International Conference on Learning Representations．Vancouver：［s.n.］，2018：1-23.
21	CHENG Z， SUN H， TAKEUCHI M，et al ．Learned image compression with discretized gaussian mixture likelihoods and attention modules ［C］∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle：IEEE，2020：7939-7948.
22	XUE T， CHEN B， WU J，et al ．Video enhancement with task-oriented flow ［J］．International Journal of Computer Vision，2019，127（8）：1106-1125.
23	MERCAT A， VIITANEN M， VANNE J ．UVG dataset：50/120fps 4K sequences for video codec analysis and development ［C］ ∥Proceedings of the 11th ACM Multimedia Systems Conference．Istanbul：Association for Computing Machinery，2020：297-302.
24	BJONTEGAARD G. Calculation of average PSNR differences between RD-curves：VCEG-M33 ［Z］．Austin：ITU-T，2001.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References