Journal of South China University of Technology(Natural Science Edition) ›› 2024, Vol. 52 ›› Issue (6): 110-119.doi: 10.12141/j.issn.1000-565X.230105

• Computer Science & Technology • Previous Articles     Next Articles

Improvement of Cross-Dataset Performance of Face Forgery Detection Based on Multi-Scale Spatiotemporal Features and Tampering Probabilities

HU Yongjian1(), ZHUO Sichao1, LIU Beibei1(), WANG Yufei2, LI Jicheng1   

  1. 1.School of Electronic and Information Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China
    2.School of Criminal Science and Technology,Guangdong Police College,Guangzhou 510440,Guangdong,China
  • Received:2023-03-13 Online:2024-06-25 Published:2023-11-08
  • Contact: 刘琲贝(1980—),女,讲师,硕士生导师,主要从事多媒体信息安全研究。 E-mail:eebbliu@scut.edu.cn
  • About author:胡永健(1962—),男,教授,博士生导师,主要从事多媒体信息安全、图像处理、人工智能及其应用等研究。E-mail: eeyjhu@scut.edu.cn
  • Supported by:
    the Scientific Research Capability Improvement Program for Key Discipline Construction of Guangdong Province(2021ZDJS047);the Characteristic Innovation Project of Colleges and Universities in Guangdong Province (Natural Science)(2023KTSCX093)

Abstract:

Most existing Deepfake face forgery detection algorithms suffer from the insufficient generalization performance despite that their intra-dataset detection performance is fairly good. This is because these methods mainly rely on local features that are prone to overfitting, which leads to unsatisfactory cross-dataset detection performance. In order to solve this problem, a face forgery detection method based on multi-scale spatiotemporal features and tampering probability is proposed, which helps to maintain good performance for cross-dataset testing, cross-forgery testing as well as video compression by detecting the inevitable temporal inconsistency between continuous frames in deepfake videos. The proposed detection method consists of three modules: a multi-scale spatiotemporal feature extraction module is employed to reveal the discontinuous traces of fake videos in the temporal domain, a three-dimension dual-attention module is designed to adaptively compute the correlation between multi-scale spatiotemporal features, and an auxiliary supervision module is used to predict the tampering probabilities of randomly selected pixels to form a supervision mask. Then, the proposed algorithm is compared with the baseline algorithm and the latest relevant works on large-scale public standard databases such as FF++, DFD, DFDC and CDF. Experimental results have show that the proposed algorithm has the best overall performance for cross-dataset testing and video compression, and has the above-average performance for cross-forgery testing. Meanwhile, it maintains good average performance for all intra-dataset testing. All the experiments demonstrate the effectiveness of the proposed algorithm.

Key words: face forgery detection, cross-dataset performance, multi-scale spatiotemporal feature, attention mechanism, tampering probability, 3D point cloud reconstruction

CLC Number: