Journal of South China University of Technology(Natural Science Edition) ›› 2025, Vol. 53 ›› Issue (3): 12-19.doi: 10.12141/j.issn.1000-565X.240109

• Computer Science & Technology • Previous Articles     Next Articles

An Open-World Object Detection Method of Capable of Addressing Label Bias Issues

HUANG Yangyang(), XU Yong, XI Xing, LUO Ronghua()   

  1. School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006,Guangdong,China
  • Received:2024-03-11 Online:2025-03-10 Published:2024-07-05
  • Contact: LUO Ronghua E-mail:huangyangy@whu.edu.cn;rhluo@scut.edu.cn
  • Supported by:
    the National Key R & D Program of China(2024YFE0105400)

Abstract:

Open World Object Detection (OWOD) extends the problem of object detection to more complex real-world dynamic scenarios, requiring the system to recognize all known and unknown object categories in the image and possess the capability for incremental learning based on newly introduced knowledge. However, current OWOD methods typically mark regions with high object scores as unknown objects and largely rely on supervision of known objects. Although these methods can detect unknown objects that are similar to known ones, they suffer from a significant label bias problem, where regions dissimilar to known objects are often misclassified as part of the background. To address this issue, this study first proposed an unsupervised region proposal generation method based on a large visual model to enhance the model’s ability to detect unknown objects. Then, considering that the sensitivity of the Region of Interest (ROI) classification stage to new categories during model training can affect the generalization performance of the Region Proposal Network (RPN) in the proposal generation stage, a decoupled joint training method for RPN region proposal generation and ROI classification was introduced to improve the model's capability to resolve label bias problems. Experimental results show that the method proposed in this study has achieved a significant improvement in detecting unknown objects on the MS-COCO dataset, with the unknown category recall rate exceeding that of the previous SOTA methods by more than twice, reaching 52.1%, while maintaining competitiveness in detecting known object categories. In terms of inference speed, the model, constructed using pure convolutional neural networks rather than dense attention mechanisms, achieves a frame rate 8.18 f/s higher than that of deformable DETR-based methods.

Key words: unsupervision, open world, incrementally learn, object detection

CLC Number: