Journal of South China University of Technology(Natural Science Edition) ›› 2023, Vol. 51 ›› Issue (5): 70-77.doi: 10.12141/j.issn.1000-565X.220435
Special Issue: 2023年电子、通信与自动控制
• Electronics, Communication & Automation Technology • Previous Articles Next Articles
ZHU Zhengyu1,2 LUO Chao2 HE Qianhua1 PENG Weifeng2 MAO Zhiwei2 ZHANG Shunsi3
Received:
2022-07-08
Online:
2023-05-25
Published:
2022-10-20
Contact:
彭炜锋(1976-),男,博士,讲师,主要从事语音信号处理研究。
E-mail:pengweifeng0215@163.com
About author:
朱铮宇(1984-),男,博士后,讲师,主要从事音视频多模态信号处理研究。E-mail:zhuzhengyu0701@163.com
Supported by:
CLC Number:
ZHU Zhengyu, LUO Chao, HE Qianhua, et al. Multi-View Lip Motion and Voice Consistency Judgment Based on Lip Reconstruction and Three-Dimensional Coupled CNN[J]. Journal of South China University of Technology(Natural Science Edition), 2023, 51(5): 70-77.
Table 3
Comparison of EER and AUC among six methods under different angles before adding frontal reconstruction"
角度/(°) | 总体EER/% | 总体AUC | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
文中方法 | AV-SISR(K=175) | STF | AV-SyncNet | QMI | BLPM | 文中方法 | AV-SISR(K=175) | STF | AV-SyncNet | QMI | BLPM | |
0 | 8.9 | 15.7 | 14.8 | 11.1 | 20.8 | 19.3 | 0.947 | 0.879 | 0.885 | 0.933 | 0.858 | 0.860 |
30 | 12.3 | 20.2 | 17.1 | 13.2 | 23.3 | 23.1 | 0.920 | 0.857 | 0.871 | 0.905 | 0.815 | 0.819 |
45 | 17.5 | 26.7 | 24.2 | 18.6 | 29.7 | 28.8 | 0.868 | 0.768 | 0.797 | 0.863 | 0.735 | 0.744 |
60 | 26.5 | 33.5 | 31.1 | 29.0 | 36.6 | 34.9 | 0.769 | 0.694 | 0.721 | 0.704 | 0.669 | 0.679 |
90 | 37.1 | 47.1 | 39.8 | 38.3 | 46.7 | 44.5 | 0.665 | 0.589 | 0.644 | 0.659 | 0.592 | 0.613 |
Table 4
Comparison of EER and AUC among six methods under different angles after adding frontal reconstruction"
角度/(°) | 总体EER/% | 总体AUC | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
文中方法 | AV-SISR(K=175) | STF | AV-SyncNet | QMI | BLPM | 文中方法 | AV-SISR(K=175) | STF | AV-SyncNet | QMI | BLPM | |
30 | 11.9 | 17.8 | 16.3 | 12.4 | 21.8 | 22.1 | 0.925 | 0.866 | 0.876 | 0.917 | 0.844 | 0.838 |
45 | 14.2 | 20.9 | 18.1 | 15.8 | 24.7 | 23.6 | 0.889 | 0.857 | 0.865 | 0.879 | 0.787 | 0.809 |
60 | 19.1 | 23.7 | 21.6 | 21.7 | 26.4 | 28.6 | 0.861 | 0.807 | 0.848 | 0.846 | 0.771 | 0.747 |
90 | 24.4 | 29.8 | 27.4 | 28.1 | 32.5 | 34.3 | 0.793 | 0.734 | 0.759 | 0.751 | 0.704 | 0.684 |
1 | DEBNATH S, RAMALAKSHMI K, SENBAGAVALLI M .Multimodal authentication system based on audio-visual data:a review[C]∥ Proceedings of 2022 International Conference for Advancement in Technology. Goa:IEEE,2022:1-5. |
2 | MIN X, ZHAI G, ZHOU J,et al .A multimodal saliency model for videos with high audio-visual correspondence [J].IEEE Transactions on Image Processing,2020,29:3805-3819. |
3 | MICHELSANTI D, TAN Z H, ZHANG S X,et al .An overview of deep-learning-based audio-visual speech enhancement and separation[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:1368-1396. |
4 | SAINUI J, SUGIYAMA M .Minimum dependency key frames selection via quadratic mutual information [C]∥ Proceedings of 2015 the Tenth International Conference on Digital Information Managemen.Jeju:IEEE,2015:148-153. |
5 | 朱铮宇,贺前华,奉小慧,等 .基于时空相关度融合的语音唇动一致性检测算法[J].电子学报,2014,42(4):779-785. |
ZHU Zheng-yu, HE Qian-hua, FENG Xiao-hui,et al .Lip motion and voice consistency algorithm based on fusing spatiotemporal correlation degree [J].Acta Electronica Sinica,2014,42(4):779-785. | |
6 | KUMAR K, NAVRATIL J, MARCHERET E,et al .Audio-visual speech synchronization detection using a bimodal linear prediction model[C]∥ Proceedings of 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.Florida:IEEE,2009:53-59. |
7 | 贺前华,朱铮宇,奉小慧 .基于平移不变字典的语音唇动一致性判决方法[J].华中科技大学学报(自然科学版),2015,43(10):69-74. |
HE Qianhua, ZHU Zhengyu, FENG Xiaohui .Lip motion and voice consistency analysis algorithm based on shift-invariant dictionary[J].Journal of Huazhong University of Science and Technology(Natural Science Edition),2015,43(10):69-74. | |
8 | CHUNG J S, ZISSERMAN A .Lip reading in profile [C]∥ Proceedings of 2017 British Machine Vision Conference.London:BMVA,2017:36-46. |
9 | KIKUCHI T, OZASA Y .Watch,listen once,and sync:audio-visual synchronization with multi-modal regression CNN[C]∥ Proceedings of 2018 IEEE International Conference on Acoustics,Speech and Signal Processing.Calgary:IEEE,2018:3036-3040. |
10 | CHENG S, MA P, TZIMIROPOULOS G,et al .Towards pose-invariant lip-reading [C]∥ Proceedings of 2020 IEEE International Conference on Acoustics,Speech and Signal Processing.Barcelona:IEEE,2020:4357-4361. |
11 | MAEDA T, TAMURA S .Multi-view convolution for lipreading[C]∥ Proceedings of 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.Tokyo:IEEE,2021:1092-1096. |
12 | PETRIDIS S, WANG Y, LI Z,et al .End-to-end multi-view lipreading [C]∥ Proceedings of 2017 British Machine Vision Conference.London:BMVA,2017:1-14. |
13 | SARI L, SINGH K, ZHOU J,et al .A multi-view approach to audio-visual speaker verification[C]∥ Proceedings of 2021 IEEE International Conference on Acoustics,Speech and Signal Processing.Toronto:IEEE,2021:6194-6198. |
14 | KOUMPAROULIS A, POTAMIANOS G .Deep view2view mapping for view-invariant lipreading[C]∥ Proceedings of 2018 IEEE Spoken Language Technology Workshop.Athens:IEEE,2018:588-594. |
15 | EL-SALLAM A A, MIAN A S .Correlation based speech-video synchronization [J].Pattern Recognition Letters,2011,32(6):780-786. |
16 | ZHU J Y, PARK T, ISOLA P,et al .Unpaired image-to-image translation using cycle-consistent adversarial networks[C]∥ Proceedings of 2017 IEEE International Conference on Computer Vision.Venice:IEEE,2017:2223-2232. |
17 | TANG Z, PENG X, LI K,et al .Towards efficient U-Nets:a coupled and quantized approach [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,42(8):2018-2050. |
18 | 张瑞峰,白金桐,关欣,等 .结合SE与BiSRU的Unet的音乐源分离方法[J].华南理工大学学报(自然科学版),2021,49(11):106-115,134. |
ZHANG Ruifeng, BAI Jintong, GUAN Xin,et al .Music source separation method based on Unet combining SE and BiSRU [J].Journal of South China University of Technology (Natural Science Edition),2021,49(11):106-115,134. | |
19 | ISOLA P, ZHU J Y, ZHOU T,et al .Image-to-image translation with conditional adversarial networks [C]∥ Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:5967-5976. |
20 | HOURRI S, KHARROUBI J .A deep learning approach for speaker recognition [J].International Journal of Speech Technology,2020,23(1):123-131. |
21 | MEHROTRA U, GARG S, KRISHNA G,et al .Detecting multiple disfluencies from speech using pre-linguistic automatic syllabification with acoustic and prosody features[C]∥ Proceedings of 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.Tokyo:IEEE,2021:761-768. |
22 | CHUNG J S, ZISSERMAN A .Out of time:automated lip sync in the wild [C]∥ Proceedings of ACCV 2016 International Workshops.Taipei:Springer,2016:251-263. |
[1] | MA Xiaoliang, AN Lingling, DENG Congjian, et al. Translation Optimization Technology of Automatic Speech Recognition Based on Industry-Specific Vocabulary [J]. Journal of South China University of Technology(Natural Science Edition), 2023, 51(8): 118-125. |
[2] | YE Feng, CHEN Biao, LAI Yizong. Contrastive Knowledge Distillation Method Based on Feature Space Embedding [J]. Journal of South China University of Technology(Natural Science Edition), 2023, 51(5): 13-23. |
[3] | LUO Yutao, GAO Qiang. Traffic Sign Detection Based on Channel Attention and Feature Enhancement [J]. Journal of South China University of Technology(Natural Science Edition), 2023, 51(12): 64-72. |
[4] | QIU Zhibin, LU Zuwen, WANG Haixiang, et al. Recognition of Bird Sounds Related to Power Grid Faults Based on Mel Spectrogram and Convolutional Neural Network [J]. Journal of South China University of Technology(Natural Science Edition), 2022, 50(2): 129-136. |
[5] | ZHANG Xiangzhu, ZHANG Lijia, SONG Yifan, et al. Obstacle Avoidance Algorithm for Unmanned Aerial Vehicle Vision Based on Deep Learning [J]. Journal of South China University of Technology (Natural Science Edition), 2022, 50(1): 101-108, 131. |
[6] | HUANG Min QI Haitao JIANG Chunlin. Coupled Collaborative Filtering Model Based on Attention Mechanism [J]. Journal of South China University of Technology(Natural Science Edition), 2021, 49(7): 59-65. |
[7] | Qi LIU Bin Yu. Pavement Crack Recognition Algorithm Based on Transposed CNN [J]. Journal of South China University of Technology(Natural Science Edition), 2021, 49(12): 124-132. |
[8] | LI Bo RAO Haobo. Salient Object Detection Based on Feature Enhancement in Complex Scene [J]. Journal of South China University of Technology (Natural Science Edition), 2021, 49(11): 135-144. |
[9] | ZHANG Yujian, LUO Yongfeng, GUO Xiaonong, et al. Seismic Damage Assessment Method for Spatial Grid Structures Considering Multi-modal Contribution [J]. Journal of South China University of Technology (Natural Science Edition), 2021, 49(10): 59-69. |
[10] | DU Qiliang, HUANG Liguang, TIAN Lianfang, et al. Recognition of Passengers'Abnormal Behavior on Escalator Based on Video Monitoring [J]. Journal of South China University of Technology (Natural Science Edition), 2020, 48(8): 10-21. |
[11] | CHEN Shanxiong, HAN Xu, LIN Xiaoyu, et al. MSER and CNN-Based Method for Character Detection in Ancient Yi Books [J]. Journal of South China University of Technology (Natural Science Edition), 2020, 48(6): 123-133. |
[12] | WEN Huiying ZHANG Weigang ZHAO Sheng. Vehicle Lane-Change Trajectory Prediction Model Based on Generative Adversarial Networks [J]. Journal of South China University of Technology (Natural Science Edition), 2020, 48(5): 32-40. |
[13] | FAN Zizhu, WANG Song, ZHANG Hong, et al. W-Net-Based Segmentation for Remote Sensing Satellite Image of High Resolution [J]. Journal of South China University of Technology (Natural Science Edition), 2020, 48(12): 114-124. |
[14] | LIU Jianguo, FENG Yunjian, JI Guo, et al. Improved Stereo Matching Algorithm Based on PSMNet [J]. Journal of South China University of Technology (Natural Science Edition), 2020, 48(1): 60-69,83. |
[15] | SUN Jifeng ZHU Yating WANG Kai. Motion Deblurring Based on DeblurGAN and Low Rank Decomposition [J]. Journal of South China University of Technology (Natural Science Edition), 2020, 48(1): 32-41,50. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||