Journal of South China University of Technology(Natural Science Edition) ›› 2024, Vol. 52 ›› Issue (10): 9-21.doi: 10.12141/j.issn.1000-565X.230578

• Electronics, Communication & Automation Technology • Previous Articles     Next Articles

Feature-Space Optimization-Inspired and Multi-Hypothesis Cross-Attention Reconstruction Neural Network for Video Compressive Sensing

YANG Chunling(), CHEN Wenjun, LIU Jiahui   

  1. School of Electronic and Information Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China
  • Received:2023-09-14 Online:2024-10-25 Published:2024-03-14
  • About author:杨春玲(1970—),女,教授,主要从事图像/视频压缩编码、图像质量评价、图像/视频压缩感知重构研究。E-mail: eeclyang@scut.edu.cn
  • Supported by:
    the Natural Science Foundation of Guangdong Province(2019A1515011949)

Abstract:

The existing video compressive sensing reconstruction network usually uses the optical flow network to achieve pixel domain motion estimation and motion compensation. However, during the reconstruction process, the input of the optical flow network is the estimated frame with poor quality, resulting in inaccurate optical flow. The optical flow-based pixel domain alignment and fusion operation will cause noise accumulation, lead to obvious artificial effects in video reconstruction frames and affect the reconstruction quality. Based on the fact that multi-channel information in the feature space has strong robustness to interference noise, this paper applied the idea of feature space optimization to the design of the video compressive sensing reconstruction neural network, and proposed a feature-space optimization-inspired and flow-guided multi-hypothesis cross-attention network (FOFMCNet). To avoid the image structure destruction caused by the noise in the optical flow when warping the image, the study designed multi-hypothesis motion estimation module guided by optical flow and the motion compensation module based on cross-attention to realize the motion estimation and motion compensation of inter-frame in feature space, so as to make full use of inter-frame correlation to assist non-key frame reconstruction. In order to strengthen the reuse of effective information in the process of feature optimization, improve the learning ability of the network and alleviate the problem of gradient explosion, this paper designed a feature-space optimization-inspired u-shape network (FOUNet) as a sub-network of FOFMCNet. Through the cascade of multiple FOUNets, the FOFMCNet realizes the optimization and reconstruction of non-key frames in the feature space. Experimental results show that the reconstruction results of the proposed algorithm are obviously better than those of the existing video compression sensing algorithms on the classical low-resolution dataset (UCF-101 and QCIF) and new high-resolution dataset (REDS4).

Key words: video compressive sensing, feature-space optimization, convolutional neural network, attention mechanism, motion estimation and compensation

CLC Number: