华南理工大学学报(自然科学版) ›› 2025, Vol. 53 ›› Issue (3): 12-19.doi: 10.12141/j.issn.1000-565X.240109

• 计算机科学与技术 • 上一篇    下一篇

一种可解决标签偏差问题的开放世界目标检测方法

黄阳阳(), 许勇, 席星, 罗荣华()   

  1. 华南理工大学 计算机科学与工程学院,广东 广州 510006
  • 收稿日期:2024-03-11 出版日期:2025-03-10 发布日期:2024-07-05
  • 通信作者: 罗荣华 E-mail:huangyangy@whu.edu.cn;rhluo@scut.edu.cn
  • 作者简介:黄阳阳(1992—),男,博士生,主要从事计算机视觉和深度学习研究。E-mail: huangyangy@whu.edu.cn
  • 基金资助:
    国家重点研发计划项目(2024YFE0105400);广州市产学研协同创新重大专项(201802010073)

An Open-World Object Detection Method of Capable of Addressing Label Bias Issues

HUANG Yangyang(), XU Yong, XI Xing, LUO Ronghua()   

  1. School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006,Guangdong,China
  • Received:2024-03-11 Online:2025-03-10 Published:2024-07-05
  • Contact: LUO Ronghua E-mail:huangyangy@whu.edu.cn;rhluo@scut.edu.cn
  • Supported by:
    the National Key R & D Program of China(2024YFE0105400)

摘要:

开放世界目标检测(OWOD)将目标检测问题推广到更为复杂的现实动态场景,要求系统能够识别图像中所有已知和未知目标的类别,并且具有根据新引入知识进行增量学习的能力。然而,当前的开放世界目标检测方法通常将高对象分数的区域标记为未知对象,且在很大程度上依赖于已知对象的监督。尽管这些方法能够检测出与已知对象相似的未知对象,但存在严重的标签偏差问题,即倾向于将与已知对象不相似的所有区域都视为背景的一部分。为解决此问题,该文首先提出了一种基于视觉大模型的无监督区域提议生成方法,以提高模型检测未知对象的能力;然后,针对模型训练过程中,感兴趣区域(ROI)分类阶段对新类别的敏感性会影响区域提议网络(RPN)在提议生成阶段的泛化性能,提出了解耦RPN区域提议生成和ROI分类的联合训练方法,以提高模型解决标签偏差问题的能力。实验结果表明:所提方法在MS-COCO数据集上检测未知对象的性能取得了巨大的提升,未知类别的召回率是SOTA方法的2倍以上,达到了52.1%,同时在检测已知对象类别方面也保持了竞争性;在推理速度方面,该文模型使用纯卷积神经网络构建,而不是使用密集注意力机制,帧率比基于可变形的DETR方法多8.18 f/s。

关键词: 无监督, 开放世界, 增量学习, 目标检测

Abstract:

Open World Object Detection (OWOD) extends the problem of object detection to more complex real-world dynamic scenarios, requiring the system to recognize all known and unknown object categories in the image and possess the capability for incremental learning based on newly introduced knowledge. However, current OWOD methods typically mark regions with high object scores as unknown objects and largely rely on supervision of known objects. Although these methods can detect unknown objects that are similar to known ones, they suffer from a significant label bias problem, where regions dissimilar to known objects are often misclassified as part of the background. To address this issue, this study first proposed an unsupervised region proposal generation method based on a large visual model to enhance the model’s ability to detect unknown objects. Then, considering that the sensitivity of the Region of Interest (ROI) classification stage to new categories during model training can affect the generalization performance of the Region Proposal Network (RPN) in the proposal generation stage, a decoupled joint training method for RPN region proposal generation and ROI classification was introduced to improve the model's capability to resolve label bias problems. Experimental results show that the method proposed in this study has achieved a significant improvement in detecting unknown objects on the MS-COCO dataset, with the unknown category recall rate exceeding that of the previous SOTA methods by more than twice, reaching 52.1%, while maintaining competitiveness in detecting known object categories. In terms of inference speed, the model, constructed using pure convolutional neural networks rather than dense attention mechanisms, achieves a frame rate 8.18 f/s higher than that of deformable DETR-based methods.

Key words: unsupervision, open world, incrementally learn, object detection

中图分类号: