基于多尺度时空特征和篡改概率改善换脸检测的跨库性能

doi:10.12141/j.issn.1000-565X.230105

华南理工大学学报(自然科学版) ›› 2024, Vol. 52 ›› Issue (6): 110-119.doi: 10.12141/j.issn.1000-565X.230105

所属专题： 2024年计算机科学与技术

基于多尺度时空特征和篡改概率改善换脸检测的跨库性能

胡永健¹ 卓思超¹ 刘琲贝^1† 王宇飞² 李纪成¹

^1.华南理工大学电子与信息工程学院，广东广州 510640
^2.广东警官学院刑事技术学院，广东广州 510440

收稿日期:2023-03-13 出版日期:2024-06-25 发布日期:2023-11-08
通信作者: 刘琲贝（1980—），女，讲师，硕士生导师，主要从事多媒体信息安全研究。 E-mail:eebbliu@scut.edu.cn
作者简介:胡永健（1962—），男，教授，博士生导师，主要从事多媒体信息安全、图像处理、人工智能及其应用等研究。E-mail： eeyjhu@scut.edu.cn
基金资助:
广东省重点建设学科科研能力提升项目(2021ZDJS047);广州开发区国际科技合作项目(2022GH15);中国人民警察大学国家基金培育课题(JJPY202402);广东省普通高校特色创新项目（自然科学）(2023KTSCX093)

Improvement of Cross-Dataset Performance of Face Forgery Detection Based on Multi-Scale Spatiotemporal Features and Tampering Probabilities

HU Yongjian¹ ZHUO Sichao¹ LIU Beibei¹ WANG Yufei² LI Jicheng¹

^1.School of Electronic and Information Engineering，South China University of Technology，Guangzhou 510640，Guangdong，China
^2.School of Criminal Science and Technology，Guangdong Police College，Guangzhou 510440，Guangdong，China

Received:2023-03-13 Online:2024-06-25 Published:2023-11-08
Contact: 刘琲贝（1980—），女，讲师，硕士生导师，主要从事多媒体信息安全研究。 E-mail:eebbliu@scut.edu.cn
About author:胡永健（1962—），男，教授，博士生导师，主要从事多媒体信息安全、图像处理、人工智能及其应用等研究。E-mail： eeyjhu@scut.edu.cn
Supported by:
the Scientific Research Capability Improvement Program for Key Discipline Construction of Guangdong Province(2021ZDJS047);the Characteristic Innovation Project of Colleges and Universities in Guangdong Province （Natural Science）(2023KTSCX093)

摘要/Abstract

摘要：

目前大多DeepFake换脸检测算法过于依赖局部特征，尽管库内检测性能尚佳，但容易出现过拟合，导致跨库检测性能不理想，即泛化性能不够好。有鉴于此，文中提出一种基于多尺度时空特征和篡改概率的换脸视频检测算法，目的是利用假脸视频中广泛存在的帧间时域不连续性缺陷来解决现有检测算法在跨库、跨伪造方式和视频压缩时性能明显下降的问题，改善泛化检测能力。该算法包括3个模块：为检测假脸视频在时域上留下的不连续痕迹，设计了一个多尺度时空特征提取模块；为自适应计算多尺度时空特征之间的时空域关联性，设计了一个三维双注意力机制模块；为预测随机选取的像素点的篡改概率和构造监督掩膜，设计了一个辅助监督模块。将所提出的算法在FF++、DFD、DFDC、CDF等公开大型标准数据库中进行实验，并与基线算法和近期发布的同类算法进行对比。结果显示：文中算法在保持库内平均检测性能优良的同时，跨库检测和抗视频压缩时的综合性能最好，跨伪造方法检测时的综合性能中等偏上。实验结果验证了文中算法的有效性。

关键词: 换脸检测, 跨库性能, 多尺度时空特征, 注意力机制, 篡改概率, 三维点云重建

Abstract:

Most existing Deepfake face forgery detection algorithms suffer from the insufficient generalization performance despite that their intra-dataset detection performance is fairly good. This is because these methods mainly rely on local features that are prone to overfitting, which leads to unsatisfactory cross-dataset detection performance. In order to solve this problem, a face forgery detection method based on multi-scale spatiotemporal features and tampering probability is proposed, which helps to maintain good performance for cross-dataset testing, cross-forgery testing as well as video compression by detecting the inevitable temporal inconsistency between continuous frames in deepfake videos. The proposed detection method consists of three modules: a multi-scale spatiotemporal feature extraction module is employed to reveal the discontinuous traces of fake videos in the temporal domain, a three-dimension dual-attention module is designed to adaptively compute the correlation between multi-scale spatiotemporal features, and an auxiliary supervision module is used to predict the tampering probabilities of randomly selected pixels to form a supervision mask. Then, the proposed algorithm is compared with the baseline algorithm and the latest relevant works on large-scale public standard databases such as FF++, DFD, DFDC and CDF. Experimental results have show that the proposed algorithm has the best overall performance for cross-dataset testing and video compression, and has the above-average performance for cross-forgery testing. Meanwhile, it maintains good average performance for all intra-dataset testing. All the experiments demonstrate the effectiveness of the proposed algorithm.

Key words: face forgery detection, cross-dataset performance, multi-scale spatiotemporal feature, attention mechanism, tampering probability, 3D point cloud reconstruction

中图分类号:

TP391

胡永健, 卓思超, 刘琲贝, 等. 基于多尺度时空特征和篡改概率改善换脸检测的跨库性能[J]. 华南理工大学学报(自然科学版), 2024, 52(6): 110-119.

HU Yongjian, ZHUO Sichao, LIU Beibei, et al. Improvement of Cross-Dataset Performance of Face Forgery Detection Based on Multi-Scale Spatiotemporal Features and Tampering Probabilities[J]. Journal of South China University of Technology(Natural Science Edition), 2024, 52(6): 110-119.

图/表 14

图1

图2

图3

图4

表1

SFEM的详细结构"

网络层	输出尺寸	DenseNet-121	输出通道
初步提取	224×224	1×1卷积	3
	112×112	7×7卷积	64
	56×56	最大池化	64
Dense Block1	56×56	$1 × 1 卷积 3 × 3 卷积$ × 6	256
Transition Block1	56×56	1×1×128卷积	128
Transition Block1	28×28	平均池化	128
Dense Block2	28×28	$1 × 1 卷积 3 × 3 卷积$ × 12	512
Transition Block2	28×28	1×1×256卷积	256
Transition Block2	14×14	平均池化	256

表1

图5

图6

图7

表2

表3

表4

表5

图8

表6

参考文献 18

1	LI Y，LYU S ．Exposing DeepFake videos by detecting face warping artifacts［C］∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition．Long Beach：IEEE，2019：46-52．
2	LI J， XIE H， LI J，et al ．Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection［C］∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition．Virtual：IEEE，2021：6458-6467．
3	DURALL R， KEUPER M， KEUPER J ．Watch your up-convolution：CNN-based generative deep neural networks are failing to reproduce spectral distributions［C］∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern recognition．Seattle：IEEE，2020：7890-7899．
4	LI L， BAO J， ZHANG T，et al ．Face X-ray for more general face forgery detection［C］∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition．Seattle：IEEE，2020：5001-5010．
5	胡永健，高逸飞，刘琲贝，等．基于图像分割网络的深度假脸视频篡改检测［J］．电子与信息学报，2021，43（1）：162-170．
	HU Yongjian， GAO Yifei， LIU Beibei，et al ．Deepfake videos detection based on image segmentation with deep neural networks［J］．Journal of Electronics & Information Technology，2021，43（1）：162-170．
6	YANG X， LI Y，LYU S ．Exposing deep fakes using inconsistent head poses［C］∥Proceedings of the 2019 IEEE International Conference on Acoustics，Speech and Signal Processing．Bredo：IEEE，2019：8261-8265．
7	HALIASSOS A， VOUGIOUKAS K， PETRIDIS S，et al ．Lips don’t lie：a generalisable and robust approach to face forgery detection［C］∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition．Virtual：IEEE，2021：5039-5049．
8	AMERINI I， CALDELLI R ．Exploiting prediction error inconsistencies through LSTM-based classifiers to detect deepfake videos［C］∥Proceedings of the 2020 ACM Workshop on Information Hiding and Multimedia Security．Denver：ACM，2020：97-102．
9	MASI I， KILLEKAR A， MASCARENHAS R，et al ．Two-branch recurrent network for isolating deepfakes in videos［C］∥Proceedings of the 16th European Confe-rence on Computer Vision．Glasgow：Springer，2020：23-28．
10	ZHENG Y， BAO J， CHEN D，et al ．Exploring temporal coherence for more general video face forgery detection［C］∥Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition．Virtual：IEEE，2021：15044-15054．
11	LI X， WAN J， JIN Y，et al ．3DPC-Net：3D point cloud network for face anti-spoofing［C］∥Proceedings of the 2020 IEEE International Joint Conference on Biometrics．Houston：IEEE，2020：1-8．
12	SHI X， CHEN Z， WANG H，et al ．Convolutional LSTM network：a machine learning approach for precipitation nowcasting［J］．Advances in Neural Information Processing Systems，2015，28：802-810．
13	ROSSLER A， COZZOLINO D， VERDOLIVA L，et al ．Faceforensics++：learning to detect manipulated facial images［C］∥Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition．Long Beach：IEEE，2019：1-11．
14	LIU H， LI X， ZHOU W，et al ．Spatial-phase shallow learning：rethinking face forgery detection in frequency domain［C］∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition．Virtual：IEEE，2021：772-781．
15	LUO Y， ZHANG Y， YAN J，et al ．Generalizing face forgery detection with high-frequency features［C］∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition．Virtual：IEEE，2021：16317-16326．
16	CHEN S， YAO T， CHEN Y，et al ．Local relation learning for face forgery detection［C］∥Proceedings of the AAAI Conference on Artificial Intelligence．Virtual：AAAI，2021，35（2）：1081-1088．
17	NI Y， MENG D， YU C，et al ．CORE：consistent representation learning for face forgery detection［C］∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition．New Orleans：IEEE，2022：12-21．
18	SUN K， YAO T， CHEN S，et al ．Dual contrastive learning for general face forgery detection［C］∥Proceedings of the AAAI Conference on Artificial Intelligence．Virtual：AAAI，2022：2316-2324．

检测算法	不同数据库上的AUC/%			平均AUC/%
检测算法	DFD	DFDC	CDF	平均AUC/%
Xception^［13］	83.16	67.90	59.46	70.17
Face X-ray^［4］	85.60	70.01	74.20	76.94
SPSL^［14］	83.23	75.56	76.88	78.56
Two-Stream HF^［15］	91.90	79.70	79.40	83.73
LRL-Net^［16］	89.24	76.53	78.26	81.34
CORE^［17］	93.74	75.74	79.45	82.98
3D R50-FTCN^［10］	90.52	79.97	79.85	83.45
DCL^［18］	91.66	76.71	82.30	83.56
文中算法	95.37	85.31	81.43	87.37

伪造方法	检测算法	不同伪造方法下的AUC/%				平均AUC/%
伪造方法	检测算法	DF	F2F	FS	NT	平均AUC/%
DF	Xception^［13］	99.32	73.60	49.05	73.61	73.90
	Face X-ray^［4］	98.71	63.31	60.06	69.82	72.98
	Two-Stream HF^［15］	99.21	76.43	49.75	81.42	76.70
	DCL^［18］	99.98	77.13	61.01	75.01	78.28
	文中算法	99.95	77.66	53.25	84.14	78.75
F2F	Xception^［13］	80.33	99.47	76.25	69.66	81.50
	Face X-ray^［4］	45.82	98.15	96.12	94.57	87.47
	Two-Stream HF^［15］	83.71	99.45	98.77	98.46	95.10
	DCL^［18］	91.91	99.21	59.58	66.67	79.34
	文中算法	88.25	99.89	85.29	81.24	88.67
FS	Xception^［13］	66.45	88.83	99.40	71.32	81.43
	Face X-ray^［4］	63.02	98.44	93.83	94.57	83.96
	Two-Stream HF^［15］	68.80	99.38	99.54	98.01	91.43
	DCL^［18］	74.80	69.75	99.90	52.60	74.26
	文中算法	57.28	89.46	99.82	71.61	79.54
NT	Xception^［13］	79.98	81.36	73.17	99.15	83.42
	Face X-ray^［4］	70.51	91.77	91.03	92.54	86.46
	Two-Stream HF^［15］	89.40	99.52	93.35	99.46	96.93
	DCL^［18］	91.23	52.13	79.31	98.97	80.41
	文中算法	93.81	94.08	94.77	99.91	93.14

检测算法	ACC/%		AUC/%
检测算法	FF++（c23）	FF++（c40）	FF++（c23）	FF++（c40）
Xception^［13］	95.73	86.86	96.30	89.30
Face X-ray^［4］	—	—	87.40	61.60
SPSL^［14］	91.50	81.57	95.32	82.82
Two-Stream HF^［15］	97.74	88.95	99.36	94.10
LRL-Net^［16］	97.59	91.47	99.46	95.21
CORE^［17］	97.61	87.99	99.66	90.61
3D R50-FTCN^［10］	96.65	90.72	99.23	94.78
DCL^［18］	93.58	89.95	99.30	94.94
文中算法	97.76	91.48	99.70	95.56

编号	训练集AUC/%			测试集AUC/%
编号	MSTE	3D-DAM	PTPVR	DFD	DFDC	CDF
A1			√	93.78	84.01	74.77
A2	√		√	93.62	82.14	76.61
A3	√	√		93.30	80.01	78.71
A4	√	√	√	95.37	85.31	81.43

编号	方法	测试集AUC/%
编号	方法	DFD	DFDC	CDF
B1	MSTE+3D-DAM	93.30	80.01	78.71
B2	MSTE+3D-DAM+FCN	91.14	78.56	76.26
B3	MSTE+3D-DAM+PTPVR	95.37	85.31	81.43

基于多尺度时空特征和篡改概率改善换脸检测的跨库性能

Improvement of Cross-Dataset Performance of Face Forgery Detection Based on Multi-Scale Spatiotemporal Features and Tampering Probabilities

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 18

相关文章 15

编辑推荐

Metrics

本文评价

[1]	曹瑞芬, 胡维玲, 李强生, 宾艳南, 郑春厚. 基于图神经网络的IL-6诱导肽预测方法[J]. 华南理工大学学报(自然科学版), 2025, 53(5): 109-117.
[2]	王庆荣, 王俊杰, 朱昌锋, 郝福乐. 基于SD-ISSA-DALSTM的交通运输业碳排放预测[J]. 华南理工大学学报(自然科学版), 2025, 53(5): 66-81.
[3]	周浪, 樊坤, 瞿华, 等. 基于ECA注意力机制改进的EfficientNet-E模型的森林火灾识别[J]. 华南理工大学学报(自然科学版), 2024, 52(2): 42-49.
[4]	杨春玲, 陈文俊, 刘嘉惠. 用于视频压缩感知的特征域优化启发及多假设交叉注意力重构神经网络[J]. 华南理工大学学报(自然科学版), 2024, 52(10): 9-21.
[5]	强睿儒, 赵小强. 基于格拉姆角差场和生成对抗网络的小样本滚动轴承故障诊断方法[J]. 华南理工大学学报(自然科学版), 2024, 52(10): 64-75.
[6]	田晟, 宋霖, 赵凯龙. 基于偏移注意力机制和多特征融合的点云分类[J]. 华南理工大学学报(自然科学版), 2024, 52(1): 100-109.
[7]	李家春, 李博文, 林伟伟. AdfNet：一种基于多样化特征的自适应深度伪造检测网络[J]. 华南理工大学学报(自然科学版), 2023, 51(9): 82-89.
[8]	李海燕, 尹浩林, 李鹏, 等. 基于密集特征推理及混合损失函数的修复算法[J]. 华南理工大学学报(自然科学版), 2023, 51(9): 99-109.
[9]	郭恩强, 符锌砂. 基于特征相似性学习的抛洒物检测方法[J]. 华南理工大学学报(自然科学版), 2023, 51(6): 30-41.
[10]	刘宇鹏, 张雷. 融合遗忘和知识点重要度的认知诊断模型[J]. 华南理工大学学报(自然科学版), 2023, 51(5): 54-62.
[11]	陆璐, 赖锦雄. 基于胶囊网络和注意力机制的智能合约漏洞检测方法[J]. 华南理工大学学报(自然科学版), 2023, 51(5): 36-44.
[12]	赵荣超, 吴百礼, 陈祝云, 等. 多尺度时空信息融合驱动的图神经网络故障诊断方法[J]. 华南理工大学学报(自然科学版), 2023, 51(12): 42-52.
[13]	沃焱, 梁籍云, 韩国强. 基于度量学习的跨模态人脸检索方法[J]. 华南理工大学学报(自然科学版), 2022, 50(6): 1-9.
[14]	王洁, 夏晓明. 基于机器阅读理解的BiLSTM-BiDAF命名实体识别[J]. 华南理工大学学报(自然科学版), 2022, 50(12): 80-88.
[15]	黄敏齐海涛蒋春林. 基于注意力机制的耦合协同过滤模型[J]. 华南理工大学学报(自然科学版), 2021, 49(7): 59-65.