Abstract: The advancements in deepfake technology have come swiftly, allowing for the creation of extremely realistic altered images, videos, and audio material. Although there has been considerable progress in unimodal detection in current research, most approaches tend to concentrate on a single modality. This paper analyses more than 20 cutting-edge studies on deepfake detection and pinpoints significant research shortcomings, including the absence of multi-modal frameworks, limitations in datasets, lack of robustness, and insufficient interpretability. To address these issues, we built a prototype detection system based solely on single-modality images that employs two models: a custom Convolutional Neural Network (CNN) and Exception CNN. Our findings underscore the necessity for solutions that incorporate multiple modalities. We suggest an integrated framework for multi-modal detection encompassing images, videos, and audio, which represents the next advancement toward reliable and effective detection systems.

Keywords: Deepfake detection, CNN, Captioned, multi-modal system, Video and Audio Forensics etc.


Downloads: PDF | DOI: 10.17148/IJARCCE.2025.141065

How to Cite:

[1] Adesh Borude, Nikam Abhishek, Waghmode Vaibhav, Mayur Gavhane,Prof. B.Y. Baravkar, Prof. R. S. Gandhi, "A Comprehensive Review and Prototype Implementation for Deepfake Detection System using Multi-Modal," International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2025.141065

Open chat
Chat with IJARCCE