华南理工大学学报(自然科学版) ›› 2024, Vol. 52 ›› Issue (6): 110-119.doi: 10.12141/j.issn.1000-565X.230105

• 计算机科学与技术 • 上一篇    下一篇

基于多尺度时空特征和篡改概率改善换脸检测的跨库性能

胡永健1(), 卓思超1, 刘琲贝1†(), 王宇飞2, 李纪成1   

  1. 1.华南理工大学 电子与信息工程学院,广东 广州 510640
    2.广东警官学院 刑事技术学院,广东 广州 510440
  • 收稿日期:2023-03-13 出版日期:2024-06-25 发布日期:2023-11-08
  • 通信作者: 刘琲贝(1980—),女,讲师,硕士生导师,主要从事多媒体信息安全研究。 E-mail:eebbliu@scut.edu.cn
  • 作者简介:胡永健(1962—),男,教授,博士生导师,主要从事多媒体信息安全、图像处理、人工智能及其应用等研究。E-mail: eeyjhu@scut.edu.cn
  • 基金资助:
    广东省重点建设学科科研能力提升项目(2021ZDJS047);广州开发区国际科技合作项目(2022GH15);中国人民警察大学国家基金培育课题(JJPY202402);广东省普通高校特色创新项目(自然科学)(2023KTSCX093)

Improvement of Cross-Dataset Performance of Face Forgery Detection Based on Multi-Scale Spatiotemporal Features and Tampering Probabilities

HU Yongjian1(), ZHUO Sichao1, LIU Beibei1(), WANG Yufei2, LI Jicheng1   

  1. 1.School of Electronic and Information Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China
    2.School of Criminal Science and Technology,Guangdong Police College,Guangzhou 510440,Guangdong,China
  • Received:2023-03-13 Online:2024-06-25 Published:2023-11-08
  • Contact: 刘琲贝(1980—),女,讲师,硕士生导师,主要从事多媒体信息安全研究。 E-mail:eebbliu@scut.edu.cn
  • About author:胡永健(1962—),男,教授,博士生导师,主要从事多媒体信息安全、图像处理、人工智能及其应用等研究。E-mail: eeyjhu@scut.edu.cn
  • Supported by:
    the Scientific Research Capability Improvement Program for Key Discipline Construction of Guangdong Province(2021ZDJS047);the Characteristic Innovation Project of Colleges and Universities in Guangdong Province (Natural Science)(2023KTSCX093)

摘要:

目前大多DeepFake换脸检测算法过于依赖局部特征,尽管库内检测性能尚佳,但容易出现过拟合,导致跨库检测性能不理想,即泛化性能不够好。有鉴于此,文中提出一种基于多尺度时空特征和篡改概率的换脸视频检测算法,目的是利用假脸视频中广泛存在的帧间时域不连续性缺陷来解决现有检测算法在跨库、跨伪造方式和视频压缩时性能明显下降的问题,改善泛化检测能力。该算法包括3个模块:为检测假脸视频在时域上留下的不连续痕迹,设计了一个多尺度时空特征提取模块;为自适应计算多尺度时空特征之间的时空域关联性,设计了一个三维双注意力机制模块;为预测随机选取的像素点的篡改概率和构造监督掩膜,设计了一个辅助监督模块。将所提出的算法在FF++、DFD、DFDC、CDF等公开大型标准数据库中进行实验,并与基线算法和近期发布的同类算法进行对比。结果显示:文中算法在保持库内平均检测性能优良的同时,跨库检测和抗视频压缩时的综合性能最好,跨伪造方法检测时的综合性能中等偏上。实验结果验证了文中算法的有效性。

关键词: 换脸检测, 跨库性能, 多尺度时空特征, 注意力机制, 篡改概率, 三维点云重建

Abstract:

Most existing Deepfake face forgery detection algorithms suffer from the insufficient generalization performance despite that their intra-dataset detection performance is fairly good. This is because these methods mainly rely on local features that are prone to overfitting, which leads to unsatisfactory cross-dataset detection performance. In order to solve this problem, a face forgery detection method based on multi-scale spatiotemporal features and tampering probability is proposed, which helps to maintain good performance for cross-dataset testing, cross-forgery testing as well as video compression by detecting the inevitable temporal inconsistency between continuous frames in deepfake videos. The proposed detection method consists of three modules: a multi-scale spatiotemporal feature extraction module is employed to reveal the discontinuous traces of fake videos in the temporal domain, a three-dimension dual-attention module is designed to adaptively compute the correlation between multi-scale spatiotemporal features, and an auxiliary supervision module is used to predict the tampering probabilities of randomly selected pixels to form a supervision mask. Then, the proposed algorithm is compared with the baseline algorithm and the latest relevant works on large-scale public standard databases such as FF++, DFD, DFDC and CDF. Experimental results have show that the proposed algorithm has the best overall performance for cross-dataset testing and video compression, and has the above-average performance for cross-forgery testing. Meanwhile, it maintains good average performance for all intra-dataset testing. All the experiments demonstrate the effectiveness of the proposed algorithm.

Key words: face forgery detection, cross-dataset performance, multi-scale spatiotemporal feature, attention mechanism, tampering probability, 3D point cloud reconstruction

中图分类号: