Journal of South China University of Technology(Natural Science Edition) ›› 2022, Vol. 50 ›› Issue (12): 41-48.doi: 10.12141/j.issn.1000-565X.220095

Special Issue: 2022年计算机科学与技术

• Computer Science & Technology • Previous Articles     Next Articles

Lightweight Object Detection Combined with Multi-Scale Dilated-Convolution and Multi-Scale Deconvolution

YI Qingming1,2 LÜ Renyi1 SHI Min1 LUO Aiwen1   

  1. 1.College of Information Science and Technology, Jinan University, Guangzhou 510632, Guangdong, China
    2.Techtotop Microeletronics Technology Co. , Ltd. , Guangzhou 510663, Guangdong, China
  • Received:2022-03-06 Online:2022-12-25 Published:2022-04-08
  • Contact: 骆爱文(1986-),女,博士,讲师,主要从事机器视觉与智能IC设计研究。 E-mail:luoaiwen@jnu.edu.cn.
  • About author:易清明(1965-),女,博士,教授,主要从事通信信号处理及SoC设计、人工智能SoC设计研究.E-mail:tyqm@jnu.edu.cn.
  • Supported by:
    the National Natural Science Foundation of China(62002134);the Basic and Applied Basic Research Foundation of Guangdong Province(2020A1515110645);the Project of Key Laboratory of New Semiconductors and Devices of Guangdong Province(2021KSY001)

Abstract:

Due to the tough issues of slow detection and heavy parameters, the deep neural networks are inapplicable to be deployed on mobile application scenarios which are computing-resource-constrained but demand high speed calculation. To improve the inference speed for object detection and achieve a better tradeoff between detection accuracy and inference speed, this paper proposed a lightweight object detection network named MDDNet which combined multi-scale dilated-convolution and multi-scale deconvolution. Firstly, a lightweight detection backbone network was designed based on an efficient single-stage strategy, and the depthwise separable convolution was introduced to reduce the parameter amount of the baseline and further speed up the feature extraction. Secondly, two feature extension branches based on multi-scale dilated convolution were added to the backbone network, which were respectively connected to the ends of the final and the penultimate residual layers of the basic network. The features of the two branches were fused in the prediction layer to augment the texture features of the shallow feature maps. Thirdly, the multi-scale deconvolution module was further introduced and connected to the deep feature network layers to increase the size of the feature map, and then the shallow feature maps of the previous layer with different scales were fused so as to enrich the feature semantic information and the detailed information, improving the detection accuracy. Finally, the parameters of the prior bounding box were optimized in the prediction layer based on the K-means clustering method, so that the prior bounding box could better match the ground truth of the object, achieving higher object recognition accuracy. The experimental results show that the MDDNet produces about 7.21×106 parameters. The average accuracy is 58.7% and 76.0% in KITTI and Pascal VOC datasets, respectively, while the corresponding inference speed respectively reaches 55 f/s and 52 f/s in the above two datasets. Therefore, MDDNet achieves a decent tradeoff among the parameter amount, detection speed, and detection accuracy, and it can be applied to real-time object detection on mobile terminals.

Key words: object detection, dilated convolution, deconvolution, multi-scale, accuracy-speed tradeoff

CLC Number: