Lightweight Object Detection Combined with Multi-Scale Dilated-Convolution and Multi-Scale Deconvolution

YI Qingming, L&Uuml; Renyi, SHI Min, et al

doi:10.12141/j.issn.1000-565X.220095

Journal of South China University of Technology(Natural Science) >

2022 , Vol. 50 >Issue 12: 41 - 48

DOI: https://doi.org/10.12141/j.issn.1000-565X.220095

Computer Science & Technology

Lightweight Object Detection Combined with Multi-Scale Dilated-Convolution and Multi-Scale Deconvolution

Expand

^1.College of Information Science and Technology, Jinan University, Guangzhou 510632, Guangdong, China
^2.Techtotop Microeletronics Technology Co. , Ltd. , Guangzhou 510663, Guangdong, China

易清明（1965-），女，博士，教授，主要从事通信信号处理及SoC设计、人工智能SoC设计研究.E-mail:tyqm@jnu.edu.cn.

Received date: 2022-03-06

Online published: 2022-04-07

Supported by

the National Natural Science Foundation of China(62002134);the Basic and Applied Basic Research Foundation of Guangdong Province(2020A1515110645);the Project of Key Laboratory of New Semiconductors and Devices of Guangdong Province(2021KSY001)

Fold

Abstract

Due to the tough issues of slow detection and heavy parameters, the deep neural networks are inapplicable to be deployed on mobile application scenarios which are computing-resource-constrained but demand high speed calculation. To improve the inference speed for object detection and achieve a better tradeoff between detection accuracy and inference speed, this paper proposed a lightweight object detection network named MDDNet which combined multi-scale dilated-convolution and multi-scale deconvolution. Firstly, a lightweight detection backbone network was designed based on an efficient single-stage strategy, and the depthwise separable convolution was introduced to reduce the parameter amount of the baseline and further speed up the feature extraction. Secondly, two feature extension branches based on multi-scale dilated convolution were added to the backbone network, which were respectively connected to the ends of the final and the penultimate residual layers of the basic network. The features of the two branches were fused in the prediction layer to augment the texture features of the shallow feature maps. Thirdly, the multi-scale deconvolution module was further introduced and connected to the deep feature network layers to increase the size of the feature map, and then the shallow feature maps of the previous layer with different scales were fused so as to enrich the feature semantic information and the detailed information, improving the detection accuracy. Finally, the parameters of the prior bounding box were optimized in the prediction layer based on the K-means clustering method, so that the prior bounding box could better match the ground truth of the object, achieving higher object recognition accuracy. The experimental results show that the MDDNet produces about 7.21×10⁶ parameters. The average accuracy is 58.7% and 76.0% in KITTI and Pascal VOC datasets, respectively, while the corresponding inference speed respectively reaches 55 f/s and 52 f/s in the above two datasets. Therefore, MDDNet achieves a decent tradeoff among the parameter amount, detection speed, and detection accuracy, and it can be applied to real-time object detection on mobile terminals.

Key words： object detection; dilated convolution; deconvolution; multi-scale; accuracy-speed tradeoff

Cite this article

YI Qingming, LÜ Renyi, SHI Min, et al . Lightweight Object Detection Combined with Multi-Scale Dilated-Convolution and Multi-Scale Deconvolution[J]. Journal of South China University of Technology(Natural Science), 2022 , 50(12) : 41 -48 . DOI: 10.12141/j.issn.1000-565X.220095

References

1	SCHROFF F， KALENICHENKO D， PHILBIN J ．FaceNet：a unified embedding for face recognition and clustering ［C］∥ Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition．Boston：IEEE，2015：815-823.
2	REDMON J， FARHADI A ．YOLO9000：better，faster，stronger ［C］∥ Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition．Honolulu：IEEE，2017：7263-7271.
3	CHEN X， XIANG S， LIU C L，et al ．Vehicle detection in satellite images by hybrid deep convolutional neural networks ［J］．IEEE Geoscience and Remote Sensing Letters，2014，11（10）：1797-1801.
4	LITJENS G， KOOI T， BEJNORDI B E，et al ．A survey on deep learning in medical image analysis ［J］．Medical Image Analysis，2017，42（9）：60-88.
5	CHEN X， KUNDU K， ZHANG Z，et al ．Monocular 3D object detection for autonomous driving ［C］∥ Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition．Las Vegas：IEEE，2016：2147-2156.
6	KIM K Y， CHOI Y W， KIM J H，et al ．Development of passenger safety system based on the moving object detection and test result on the real vehicle ［C］∥ Proceedings of the Eighth International Conference on Ubiquitous and Future Networks．Vienna：IEEE，2016：64-66.
7	LIU W， ANGUELOV D， ERHAN D，et al ．SSD：single shot multibox detector ［C］∥ Proceedings of the 14th European Conference on Computer Vision．Amsterdam：Springer，2016：21-37.
8	GIRSHICK R， DONAHUE J， DARRELL T，et al ．Rich feature hierarchies for accurate object detection and semantic segmentation ［C］∥ Proceedings of 2014 IEEE　Conference on Computer Vision and Pattern Recognition．Columbus：IEEE，2014：580-587.
9	IANDOLA F N， HAN S， MOSKEWICZ M W，et al ．SqueezeNet：AlexNet-level accuracy with 50x fewer para-meters and <0.5 MB model size ［EB/OL］．（2016-02-24）［2021-10-11］．.
10	HOWARD A G， ZHU M， CHEN B，et al ．MobileNets：efficient convolutional neural networks for mobile vision applications ［EB/OL］．（2017-04-17）［2021-10-11］．.
11	ZHANG X， ZHOU X， LIN M，et al ．ShuffleNet：an extremely efficient convolutional neural network for mobile devices ［C］∥ Proceedings of 2018 IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition．Salt Lake City：IEEE，2018：6848-6856.
12	MA N， ZHANG X， ZHENG H T，et al ．ShuffleNet V2：practical guidelines for efficient CNN architecture design ［C］∥ Proceedings of the 15th European Conference on Computer Vision．Munich：Springer，2018：122-138.
13	SANDLER M， HOWARD A， ZHU M，et al ．MobileNetV2：inverted residuals and linear bottlenecks ［C］∥ Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition．Salt Lake City：IEEE，2018：4510-4520.
14	FU C Y， LIU W， RANGA A，et al ．DSSD：deconvolutional single shot detector ［EB/OL］．（2017-01-23）［2021-10-11］．.
15	LIU S， HUANG D ．Receptive field block net for accurate and fast object detection ［C］∥ Proceedings of the 15th European Conference on Computer Vision．Munich：Springer，2018：404-419.
16	SIMONYAN K， ZISSERMAN A ．Very deep convolutional networks for large-scale image recognition ［EB/OL］．（2014-09-04）［2021-10-11］．.
17	吴帅，徐勇，赵东宁．基于深度卷积网络的目标检测综述［J］．模式识别与人工智能，2018，31（4）：335-346.
17	WU Shuai， XU Yong， ZHAO Dongning ．Survey of object detection based on deep convolutional networks ［J］．Pattern Recognition and Artificial Intelligence，2018，31（4）： 335-346.
18	LIN T Y， DOLLAR P， GIRSHICK R，et al ．Feature pyramid networks for object detection ［C］∥ Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition．Honolulu：IEEE，2017：2117-2125.
19	GLOROT X， BORDES A， BENGIO Y ．Deep sparse rectifier neural networks ［J］．Journal of Machine Learning Research，2011，15：315-323.
20	朱槐雨，李博．单阶段多框检测器无人机航拍目标识别方法［J］．计算机应用，2021，41（11）：3234-3241.
20	ZHU Huaiyu， LI Bo ．Single shot multibox detector recognition method for aerial targets of unmanned aerial vehicle ［J］．Journal of Computer Applications，2021，41（11）：3234-3241.
21	梁京章，黄星舒，吴丽娟，等．基于KPCA和改进K-means的电力负荷曲线聚类方法［J］．华南理工大学学报（自然科学版），2020，48（6）：143-150.
21	LIANG Jingzhang， HUANG Xingshu， WU Lijuan，et al ．Clustering method of power load profiles based on KPCA and improved K-means ［J］．Journal of South China University of Technology（Natural Science Edition），2020，48（6）：143-150.
22	VICENTE S， CARREIRA J， AGAPITO L，et al ．Reconstructing Pascal VOC ［C］∥ Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Re-cognition．Columbus：IEEE，2014：41-48.
23	GEIGER A， LENZ P， URTASUN R ．Are we ready for autonomous driving？The KITTI vision benchmark suite ［C］∥ Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition．Providence：IEEE，2012：3354-3361.
24	郑冬，李向群，许新征．基于轻量化SSD的车辆及行人检测网络［J］．南京师大学报（自然科学版），2019，42（1）：73-81.
24	ZHENG Dong， LI Xiangqun， XU Xinzheng ．Vehicle and pedestrian detection model based on lightweight SSD ［J］．Journal of Nanjing Normal University（Natural Science Edition），2019，42（1）：73-81.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References