华南理工大学学报(自然科学版)

• 计算机科学与技术 • 上一篇    下一篇

多特征增量学习的视频重建图像质量增强算法

丁丹丹1 陈靖森1 费加罗1 佟骏超1 潘志庚1,2 姚争为1   

  1. 1. 杭州师范大学 信息科学与工程学院,浙江 杭州 311121; 2. 广州玖的数码科技有限公司,广东 广州 511400
  • 收稿日期:2018-08-25 出版日期:2018-12-25 发布日期:2018-11-01
  • 通信作者: 丁丹丹(1983-),女,讲师,主要从事视频图像处理、视频编码研究. E-mail:DandanDing@hznu.edu.cn
  • 作者简介:丁丹丹(1983-),女,讲师,主要从事视频图像处理、视频编码研究.
  • 基金资助:
    国家重点研发计划项目(2017YFB1002803);国家级大学生创新创业训练计划项目(201810346015)

A Multi-Feature Incremental Learning Neural Network for the Quality Enhancement of Video Reconstructed Pictures in H. 265/HEVC

DING Dandan1 CHEN Jingsen1 FEI Jialuo1 TONG Junchao1 PAN Zhigeng1,2 YAO Zhengwei1   

  1. 1. School of Information Science and Engineering,Hangzhou Normal University,Hangzhou 311121,Zhejiang,China; 2. Guangzhou NINED LLC,Guangzhou 511400,Guangdong,China
  • Received:2018-08-25 Online:2018-12-25 Published:2018-11-01
  • Contact: 丁丹丹(1983-),女,讲师,主要从事视频图像处理、视频编码研究. E-mail:DandanDing@hznu.edu.cn
  • About author:丁丹丹(1983-),女,讲师,主要从事视频图像处理、视频编码研究.
  • Supported by:
    Supported by the National Key R&D Program of China under Grant (2017YFB1002803) and the National-Level Collage Student’s Innovative Entrepreneurial Training Plan Program (201810346015)

摘要: 新一代视频编码标准 H. 265/HEVC 采用了去方块滤波与样点自适应补偿滤波 技术来去除视频重建图像的块效应并降低失真. 这两种技术都源于信号处理理论,依赖人 工设计相关算法与参数,并不能充分挖掘自然视频丰富而复杂的特性. 本文将视频编码的 环路滤波问题转化为端到端的回归问题,借助于卷积神经网络,自动学习重建视频图像与 原始图像的复杂映射关系,降低两者的误差,进而提升编码效率. 所提出的多特征增量学 习网络模型共 35 层,整个网络采用全局残差学习方式,通过依次串联多特征增量学习块, 不断提取、筛选,加强有用特征,提升网络的感知能力与学习能力;在局部的每个增量学习 块内,设计了多尺度的卷积核,借助于稠密网络的思想,充分利用各个层次的特征,使得信 息在各层间充分传递. 实验结果表明,这种稠密与稀疏结合的网络结构有效地提高了网络 的学习能力,并具备良好的泛化性,对视频编码重建图像的质量增强有明显效果. 所提出 的网络模型用于取代 H. 265/HEVC 的环路滤波,在 All Intra Main 配置下,亮度分量获得 最高 -11. 12%,平均 - 6. 32% 的 BD-rate 节省. 该模型用于 H. 265/HEVC 的环路滤波, BD-rate 平均可降低 5. 24%.

关键词: H.265/HEVC, 环路滤波, 卷积神经网络, 增量学习

Abstract: The new generation video coding standard H. 265/HEVC employs in-loop filter,which includes de-bloc- king (DBF) and sample adaptive offset filter (SAO),to remove the blocking artifacts and reduce the distortions of reconstructed video frames. Both of DBF and SAO originated from signal processing theory,and the corresponding algorithms and parameters are designed and set manually. Although the computational complexity is relatively low, such filters may not deal with different kinds of contents well enough as the natural videos are much more complex. This paper formulates the loop-filter problem in video coding as an end-to-end regression problem,which can be solved by deep neural network. The relationship between reconstructed frames and original frames are mapped au- tomatically and as a result,the differences between them are minimized. The proposed Multi-Feature based Incre- mental Learning Network (MFILNet) includes 35 layers. The integrated network adopts global residual learning strategy and cascades several Feature Incremental Learning Blocks (FIBs) to extract features of different levels. Consequently,useful features are finally extracted,selected and enhanced to improve the perceptual ability of the network. Within each FIB,variable convolutional kernels are adopted. Inspirited by DenseNet,features from dif- ferent layers are fused,thus to facilitate information flow among layers. Experimental results show that with the scheme of combining density and sparsity,learning capability and generalization capability of the proposed network are boosted tremendously. Both objective and subjective quality of the video compressed frames is improved signifi- cantly. Consequently,the proposed network model is used to substitute the DBF and SAO in H. 265/HEVC. Up to 11. 2% and averaged 6. 32% BD-rate reduction is obtained. The model is also used after the DBF and SAO, 5. 24% BD-rate saving can be obtained in average.

Key words:  H. 265/HEVC, in-loop filter, convolutional neural network, incremental learning