Journal of South China University of Technology(Natural Science Edition) ›› 2024, Vol. 52 ›› Issue (10): 41-50.doi: 10.12141/j.issn.1000-565X.230673

• Computer Science & Technology • Previous Articles     Next Articles

Semantic-Visual Consistency Constraint Network for Zero-Shot Image Semantic Segmentation

CHEN Qiong(), FENG Yuan, LI Zhiqun, YANG Yong   

  1. School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006,Guangdong,China
  • Received:2023-10-29 Online:2024-10-25 Published:2023-12-27
  • About author:陈琼(1966—),女,博士,副教授,主要从事机器学习、不平衡分类、图像分类与分割、深度强化学习研究。E-mail: csqchen@scut.edu.cn
  • Supported by:
    the National Natural Science Foundation of China(62176095)

Abstract:

Zero-shot image semantic segmentation is one of the important tasks in the visual field of zero-shot learning, aiming to segment novel categories unseen during training. The current distribution of visual features based on pixel-level visual feature generation is inconsistent with real visual feature distribution. The synthesized visual features inadequately reflect class semantic information, leading to low discriminability in these features. Some existing generative methods consume significant computational resources to obtain the discriminative information conveyed by semantic features. In view of the above problems, this paper proposed a zero-shot image semantic segmentation network called SVCCNet, which is based on semantic-visual consistency constraints. SVCCNet uses a semantic-visual consistency constraint module to facilitate the mutual transformation between semantic features and visual features, enhancing their correlation and diminishing the disparity between the spatial structures of real and synthesized visual features, which mitigates the inconsistency problem between the distributions of synthesized and real visual features. The semantic-visual consistency constraint module achieves the correspondence between visual features and class semantics through two mutually constrained reconstruction mappings, while maintaining low model complexity. Experimental results on the PASCAL-VOC and PASCAL-Context datasets demonstrate that SVCCNet outperforms mainstream methods in terms of pixel accuracy, mean accuracy, mean intersection over union (IoU), and harmonic IoU.

Key words: semantic segmentation, feature generation, zero-shot learning, computer vision, deep learning

CLC Number: