一种可解决标签偏差问题的开放世界目标检测方法

黄阳阳; 许勇; 席星; 罗荣华

doi:10.12141/j.issn.1000-565X.240109

华南理工大学学报(自然科学版) >

2025 , Vol. 53 >Issue 3: 12 - 19

DOI: https://doi.org/10.12141/j.issn.1000-565X.240109

计算机科学与技术

一种可解决标签偏差问题的开放世界目标检测方法

黄阳阳 ,
许勇 ,
席星 ,
罗荣华

展开

华南理工大学计算机科学与工程学院，广东广州 510006

黄阳阳（1992—），男，博士生，主要从事计算机视觉和深度学习研究。E-mail： huangyangy@whu.edu.cn

罗荣华（1975—），男，博士，副教授，主要从事智能机器人、机器人视觉、机器人认知原理研究。E-mail： rhluo@scut.edu.cn

收稿日期: 2024-03-11

网络出版日期: 2024-07-05

基金资助

国家重点研发计划项目(2024YFE0105400);广州市产学研协同创新重大专项(201802010073)

收起

An Open-World Object Detection Method of Capable of Addressing Label Bias Issues

HUANG Yangyang ,
XU Yong ,
XI Xing ,
LUO Ronghua

Expand

School of Computer Science and Engineering，South China University of Technology，Guangzhou 510006，Guangdong，China

黄阳阳（1992—），男，博士生，主要从事计算机视觉和深度学习研究。E-mail： huangyangy@whu.edu.cn

Received date: 2024-03-11

Online published: 2024-07-05

Supported by

the National Key R & D Program of China(2024YFE0105400)

Fold

摘要

开放世界目标检测（OWOD）将目标检测问题推广到更为复杂的现实动态场景，要求系统能够识别图像中所有已知和未知目标的类别，并且具有根据新引入知识进行增量学习的能力。然而，当前的开放世界目标检测方法通常将高对象分数的区域标记为未知对象，且在很大程度上依赖于已知对象的监督。尽管这些方法能够检测出与已知对象相似的未知对象，但存在严重的标签偏差问题，即倾向于将与已知对象不相似的所有区域都视为背景的一部分。为解决此问题，该文首先提出了一种基于视觉大模型的无监督区域提议生成方法，以提高模型检测未知对象的能力；然后，针对模型训练过程中，感兴趣区域（ROI）分类阶段对新类别的敏感性会影响区域提议网络（RPN）在提议生成阶段的泛化性能，提出了解耦RPN区域提议生成和ROI分类的联合训练方法，以提高模型解决标签偏差问题的能力。实验结果表明：所提方法在MS-COCO数据集上检测未知对象的性能取得了巨大的提升，未知类别的召回率是SOTA方法的2倍以上，达到了52.1%，同时在检测已知对象类别方面也保持了竞争性；在推理速度方面，该文模型使用纯卷积神经网络构建，而不是使用密集注意力机制，帧率比基于可变形的DETR方法多8.18 f/s。

关键词： 无监督; 开放世界; 增量学习; 目标检测

本文引用格式

黄阳阳 , 许勇 , 席星 , 罗荣华 . 一种可解决标签偏差问题的开放世界目标检测方法[J]. 华南理工大学学报(自然科学版), 2025 , 53(3) : 12 -19 . DOI: 10.12141/j.issn.1000-565X.240109

Abstract

Open World Object Detection (OWOD) extends the problem of object detection to more complex real-world dynamic scenarios, requiring the system to recognize all known and unknown object categories in the image and possess the capability for incremental learning based on newly introduced knowledge. However, current OWOD methods typically mark regions with high object scores as unknown objects and largely rely on supervision of known objects. Although these methods can detect unknown objects that are similar to known ones, they suffer from a significant label bias problem, where regions dissimilar to known objects are often misclassified as part of the background. To address this issue, this study first proposed an unsupervised region proposal generation method based on a large visual model to enhance the model’s ability to detect unknown objects. Then, considering that the sensitivity of the Region of Interest (ROI) classification stage to new categories during model training can affect the generalization performance of the Region Proposal Network (RPN) in the proposal generation stage, a decoupled joint training method for RPN region proposal generation and ROI classification was introduced to improve the model's capability to resolve label bias problems. Experimental results show that the method proposed in this study has achieved a significant improvement in detecting unknown objects on the MS-COCO dataset, with the unknown category recall rate exceeding that of the previous SOTA methods by more than twice, reaching 52.1%, while maintaining competitiveness in detecting known object categories. In terms of inference speed, the model, constructed using pure convolutional neural networks rather than dense attention mechanisms, achieves a frame rate 8.18 f/s higher than that of deformable DETR-based methods.

Key words： unsupervision; open world; incrementally learn; object detection

参考文献

1	REN S， HE K， GIRSHICK R，et al ． Faster R-CNN：towards real-time object detection with region proposal networks ［J］．IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（6）：1137-1149.
2	REDMON J， DIVVALA S， GIRSHICK R，et al ．You only look once：unified，real-time object detection ［C］∥ Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition．Las Vegas：IEEE，2016：779-788.
3	LIN T Y， GOYAL P， GIRSHICK R，et al ．Focal loss for dense object detection ［J］．IEEE Transactions on Pattern Analysis and Machine Intelligence，2020，42（2）：318-327.
4	ZHU X， SU W， LU L，et al ． Deformable DETR：deformable transformers for end-to-end object detection ［C］∥ Proceedings of the 9th International Conference on Learning Representations．Vienna：OpenReview.net，2021：1-16.
5	DHAMIJA A， GüNTHER M， VENTURA J，et al ．The overlooked elephant of object detection：open set ［C］∥ Proceedings of 2020 IEEE Winter Conference on Applications of Computer Vision．Snowmass：IEEE，2020：1010-1019.
6	JOSEPH K J， KHAN S， KHAN F S，et al ．Towards open world object detection ［C］∥ Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition．Nashville：IEEE，2021：5826-5836.
7	GUPTA A， NARAYAN S， JOSEPH K J，et al ．OW-DETR：open-world detection transformer［C］∥ Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition．New Orleans：IEEE，2022：9225-9234.
8	ZOHAR O， WANG K C， YEUNG S ．PROB：probabilistic objectness for open world object detection ［C］∥ Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition．Vancouver：IEEE，2023：11444-11453.
9	MA S， WANG Y， WEI Y，et al ．CAT：localization and identification cascade detection transformer for open-world object detection ［C］∥ Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition．Vancouver：IEEE，2023：19681-19690.
10	DONG N， ZHANG Y， DING M，et al ．Open world DETR：transformer based open world object detection ［EB/OL］．（2022-12-06）［2024-03-05］．.
11	WANG X， YU Z， DE MELLO S，et al ．FreeSOLO：learning to segment objects without annotations ［C］∥ Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans：IEEE，2022：4156-4166.
12	BAR A， WANG X， KANTOROV V，et al ．DETReg：unsupervised pretraining with region priors for object detection ［C］∥ Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition．New Orleans：IEEE，2022：14585-14595.
13	KIRILLOV A， MINTUN E， RAVI N，et al ．Segment anything ［C］∥ Proceedings of 2023 IEEE/CVF International Conference on Computer Vision．Paris：IEEE，2023：3992-4003.
14	ZHOU Y ．Rethinking reconstruction autoencoder-based out-of-distribution detection ［C］∥ Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition．New Orleans：IEEE，2022：7369-7377.
15	JIANG W， GE Y， CHENG H，et al ． READ：aggregating reconstruction error into out-of-distribution detection ［C］∥ Proceedings of the 37th AAAI Conference on Artificial Intelligence．Washington D C：AAAI，2023：14910-14918.
16	OSADA G， TAKAHASHI T， AHSAN B，et al ．Out-of-distribution detection with reconstruction error and typicality-based penalty ［C］∥ Proceedings of 2023 IEEE/CVF Winter Conference on Applications of Computer Vision．Waikoloa：IEEE，2023：5540-5552.
17	FANG R H， PANG G S， ZHOU L，et al ．Unsupervised recognition of unknown objects for open-world object detection ［EB/OL］．（2023-08-31）［2024-03-05］．.
18	SHMELKOV K， SCHMID C， ALAHARI K ．Incremental learning of object detectors without catastrophic forgetting［C］∥ Proceedings of 2017 IEEE International Conference on Computer Vision．Venice：IEEE，2017：3420-3429.
19	HAO Y， FU Y， JIANG Y，et al ．An end-to-end architecture for class-incremental object detection with knowledge distillation ［C］∥ Proceedings of 2019 IEEE International Conference on Multimedia and Expo．Shanghai：IEEE，2019：1-6.
20	YANG B， DENG X， SHI H，et al ．Continual object detection via prototypical task correlation guided gating mechanism［C］∥ Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition．New Orleans：IEEE，2022：9245-9254.
21	HE K， GKIOXARI G， DOLLáR P，et al ．Mask R-CNN ［C］∥ Proceedings of 2017 IEEE International Conference on Computer Vision．Venice：IEEE，2017：2980-2988.
22	WEI F， GAO Y， WU Z，et al ．Aligning pretraining for detection via object-level contrastive learning［C］∥ Proceedings of the 35th International Conference on Neural Information Processing Systems．Red Hook：Curran Associates Inc.， 2021：22682-22694.
23	LI Z， HOIEM D ．Learning without forgetting［C］∥Proceedings of the 14th European Conference on Computer Vision．Amsterdam：Springer，2016：614-629.
24	DHAR P， SINGH R V， PENG K C，et al ．Learning without memorizing［C］∥ Proceedings of 2019 IEEE/CVF Conference on Computer Visionand Pattern Recognition．Long Beach：IEEE，2019：5133-5141.
25	HE K， ZHANG X， REN S，et al ．Deep residual learning for image recognition ［C］∥ Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition．Las Vegas：IEEE，2016：770-778.
26	UIJLINGS J R， GEVERS T， SMEULDERS W A ．Selective search for object recognition［J］．International Journal of Computer Vision，2013，104（2）：154-171.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献