Journal of South China University of Technology (Natural Science Edition) ›› 2020, Vol. 48 ›› Issue (12): 52-62.doi: 10.12141/j.issn.1000-565X.200083

• Electronics, Communication & Automation Technology • Previous Articles     Next Articles

Monocular Image Depth Estimation Based on Multi-Scale Attention Oriented Network

LIU Jieping WEN Junwen LIANG Yaling   

  1. School of Electronic and Information Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China
  • Received:2020-02-26 Revised:2020-04-14 Online:2020-12-25 Published:2020-12-01
  • Contact: 梁亚玲(1977-) ,女,博士,副教授,主要从事机器学习、图像处理等研究。 E-mail:ylliang@scut.edu.cn
  • About author:刘杰平(1961-),女,博士,副教授,主要从事图像、视频、3D信号处理等研究。E-mail: eeliujp@scut.edu.cn
  • Supported by:
    Supported by the National Natural Science Foundation of China ( 61701181,61471173) and the Natural Science Foundation of Guangdong Province ( 2017A030325430)

Abstract:

Aiming at the problems of low spatial resolution and unclear edges in the existing depth estimation algorithms of monocular images based on deep learning,a depth estimation algorithm of monocular images based on multi-scale attention-oriented network was put forward. Firstly,an end-to-end encoder-decoder model was designed,and the encoder extracts features at multiple scales. To ensure better depth continuity,the decoder gradually optimize details and scene structure of extracted multi-scale features by combining residual learning with channel attention fusion. Considering the loss of depth details caused by multiple down-sampling,a boundary enhancement module was designed. By introducing spatial attention,the inter-class contrast of different objects was improved to enhance the boundary details of the image. Finally,the optimization module fuses multi-scale features from the decoder and the boundary enhancement module to generate a depth image. Experimental results show that,compared with the current mainstream algorithms,the depth image generated by the algorithm has improved quality,showing more detailed object contour information and good performance in both objective indicators and subjective effects.

Key words: deep learning, monocular image depth estimation, multi-scale attention-oriented network, multi-scale feature, channel attention fusion

CLC Number: