华南理工大学学报(自然科学版) ›› 2022, Vol. 50 ›› Issue (6): 10-18.doi: 10.12141/j.issn.1000-565X.210603

所属专题: 2022年计算机科学与技术

• 计算机科学与技术 • 上一篇    下一篇

基于多尺度视觉Transformer的图像篡改定位

陆璐 钟文煜 吴小坤   

  1. 华南理工大学 计算机科学与工程学院,广东 广州 510640
  • 收稿日期:2021-09-17 修回日期:2021-10-27 出版日期:2022-06-25 发布日期:2021-11-08
  • 通信作者: 吴小坤 (1980-),女,教授,主要从事数据分析和信息可视化研究 E-mail:wuxiaokun@ scut. edu. cn
  • 作者简介:陆璐 (1971-),男,教授,主要从事计算机视觉和软件质量保障研究
  • 基金资助:
    国家社科基金重大项目;中山市产学研重大项目

Image tampering localization based on mutil-scale transformer

LU Lu  ZHONG Wenyu  WU Xiaokun   

  1. School of Computer Science and Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China
  • Received:2021-09-17 Revised:2021-10-27 Online:2022-06-25 Published:2021-11-08
  • Contact: 吴小坤 (1980-),女,教授,主要从事数据分析和信息可视化研究 E-mail:wuxiaokun@ scut. edu. cn
  • About author:陆璐 (1971-),男,教授,主要从事计算机视觉和软件质量保障研究
  • Supported by:
    the National Social Science Foundation Key Project of China;the Major Program of the Zhongshan Industry-Academia-Research Fund

摘要: 随着数字图像处理技术不断发展,图像篡改不再局限于图像拼接等单一手段,而是通过图像编辑 软件后处理隐藏恶意篡改痕迹,导致现有传统算法和基于深度学习的定位方法效果不佳。针对现有图像篡 改算法定位精度不高的问题,本文提出一种端到端基于多尺度视觉Transformer的图像篡改定位网络,该网 络融合Transformer和卷积编码器提取篡改区域与非篡改区域的特征差异。多尺度Transformer对不同尺寸图 像块序列的空间信息进行建模,从而使网络适应各种形状大小的篡改区域。实验结果表明,本文所提出的 算法在CASIA和NIST2016测试集的F1分数分别为0.431和0.877,AUC值分别为0.728和0.971,相比当前的 主流算法具有较为明显的性能提升。而且,本文所提算法在应对JPEG压缩攻击具有较强的鲁棒性。

关键词: 深度学习, 视觉Transformer, 图像篡改, 纵横注意力

Abstract: With the continuous development of digital image processing technology, image tampering is no longer limited to a single method such as image splicing. The traces of malicious tampering are hidden through the post-processing of the image editing software, which leads to poor results of traditional image forgery detection algorithms and the tampering localization methods based on deep learning. Aiming at the problem of low accuracy of existing image tampering algorithms, an end-to-end image tampering location network based on Multi-Scale Visual Transformer is proposed. The network combines a transformer and a convolutional encoder to extract the feature difference between the tampered area and the non-tampered area. Multi-Scale Transformer models the spatial information of image block sequences of different sizes, so that the network can adapt to tampered areas of various shapes and sizes. Experimental results show that the F1 and AUC scores of the proposed algorithm in the CASIA and NIST2016 test sets are 0.431、0.877、0.728 and 0.971, respectively, which are significantly improved co- mpared to the existing mainstream algorithms. Moreover, the algorithm proposed in this paper is robust against JPEG compression attacks.

Key words: Deep learning, visual Transformer, image tampering, vertical and horizontal attention

中图分类号: