华南理工大学学报(自然科学版) ›› 2020, Vol. 48 ›› Issue (12): 52-62.doi: 10.12141/j.issn.1000-565X.200083

• 电子、通信与自动控制 • 上一篇    下一篇

基于多尺度注意力导向网络的单目图像深度估计

刘杰平 温竣文 梁亚玲   

  1. 华南理工大学 电子与信息学院,广东 广州 510640
  • 收稿日期:2020-02-26 修回日期:2020-04-14 出版日期:2020-12-25 发布日期:2020-12-01
  • 通信作者: 梁亚玲(1977-) ,女,博士,副教授,主要从事机器学习、图像处理等研究。 E-mail:ylliang@scut.edu.cn
  • 作者简介:刘杰平(1961-),女,博士,副教授,主要从事图像、视频、3D信号处理等研究。E-mail: eeliujp@scut.edu.cn
  • 基金资助:

    国家自然科学基金资助项目 ( 61701181,61471173) ; 广东省自然科学基金资助项目 ( 2017A030325430)

Monocular Image Depth Estimation Based on Multi-Scale Attention Oriented Network

LIU Jieping WEN Junwen LIANG Yaling   

  1. School of Electronic and Information Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China
  • Received:2020-02-26 Revised:2020-04-14 Online:2020-12-25 Published:2020-12-01
  • Contact: 梁亚玲(1977-) ,女,博士,副教授,主要从事机器学习、图像处理等研究。 E-mail:ylliang@scut.edu.cn
  • About author:刘杰平(1961-),女,博士,副教授,主要从事图像、视频、3D信号处理等研究。E-mail: eeliujp@scut.edu.cn
  • Supported by:
    Supported by the National Natural Science Foundation of China ( 61701181,61471173) and the Natural Science Foundation of Guangdong Province ( 2017A030325430)

摘要:

针对现有基于深度学习的单目图像深度估计算法存在的空间分辨率低和边缘模 糊等问题,提出了一种基于多尺度注意力导向网络的单目图像深度估计算法。首先设计 了一个端到端的编码器 - 解码器模型,编码器以多个尺度进行特征提取。为了保证更好 的深度连续性,解码器结合残差学习以及通道注意力融合,对提取的多尺度特征逐步优 化细节以及场景结构。考虑到多次下采样会导致深度图细节的丢失,设计了边界增强模 块,通过引入空间注意力,提升不同物体的类间对比度以增强图像的边界细节。最后, 优化模块融合来自解码器和边界增强模块的多尺度特征,生成深度图像。实验结果表 明,与当前主流的算法相比,文中算法生成的深度图像质量得到了提高,表现出了更细 致的物体轮廓信息,在客观指标和主观效果上均有良好的表现。

关键词: 深度学习, 单目图像深度估计, 多尺度注意力导向网络, 多尺度特征, 通道注意力融合

Abstract:

Aiming at the problems of low spatial resolution and unclear edges in the existing depth estimation algorithms of monocular images based on deep learning,a depth estimation algorithm of monocular images based on multi-scale attention-oriented network was put forward. Firstly,an end-to-end encoder-decoder model was designed,and the encoder extracts features at multiple scales. To ensure better depth continuity,the decoder gradually optimize details and scene structure of extracted multi-scale features by combining residual learning with channel attention fusion. Considering the loss of depth details caused by multiple down-sampling,a boundary enhancement module was designed. By introducing spatial attention,the inter-class contrast of different objects was improved to enhance the boundary details of the image. Finally,the optimization module fuses multi-scale features from the decoder and the boundary enhancement module to generate a depth image. Experimental results show that,compared with the current mainstream algorithms,the depth image generated by the algorithm has improved quality,showing more detailed object contour information and good performance in both objective indicators and subjective effects.

Key words: deep learning, monocular image depth estimation, multi-scale attention-oriented network, multi-scale feature, channel attention fusion

中图分类号: