Abstract: The proliferation of deepfake audio poses significant challenges, including the erosion of trust in digital communications and heightened risks of fraud and misinformation. This paper presents EchoVerify, a robust detection framework integrating Mel-Frequency Cepstral Coefficients (MFCC) and Speech Emotion Recognition (SER). Using Convolutional Neural Networks (CNNs), EchoVerify extracts audio features to identify synthetic manipulations with high accuracy. Our model outperforms existing approaches in noisy conditions, making it a critical tool for applications requiring audio authentication, such as cybersecurity and digital forensics.
Keywords: Deepfake audio, EchoVerify, MFCC, Random Forest, SVM, emotion detection, audio authentication, synthetic speech, digital security, misinformation prevention.
Downloads:
|
DOI:
10.17148/IJARCCE.2025.14438
[1] Smita Chunamari, Pranali Lembhe, Basundhara Maity, Sanika Sawant, Srinidhi Tekumalla, "EchoVerify: Deepfake Audio Detection Leveraging MFCC and Random Forest Techniques," International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2025.14438