Abstract: The proliferation of deepfake audio poses significant challenges, including the erosion of trust in digital communications and heightened risks of fraud and misinformation. This paper presents EchoVerify, a robust detection framework integrating Mel-Frequency Cepstral Coefficients (MFCC) and Speech Emotion Recognition (SER). Using Convolutional Neural Networks (CNNs), EchoVerify extracts audio features to identify synthetic manipulations with high accuracy. Our model outperforms existing approaches in noisy conditions, making it a critical tool for applications requiring audio authentication, such as cybersecurity and digital forensics.
Keywords: Deepfake audio, EchoVerify, MFCC, Random Forest, SVM, emotion detection, audio authentication, synthetic speech, digital security, misinformation prevention.
|
DOI:
10.17148/IJARCCE.2025.14438