Abstract: With the rapid advancement of Artificial Intelligence, deepfake audio generation has become increasingly realistic and difficult to identify. These synthetic voices can be misused for fraud, impersonation, political manipulation, and privacy violations. Traditional audio verification systems based on manual inspection or basic acoustic features are not sufficient to detect these sophisticated manipulations.
This research introduces a Machine Learning-based Audio Deepfake Detection System that analyzes speech signals to distinguish between real and synthetic audio. The proposed model uses a CNN + LSTM hybrid architecture, trained on Mel-spectrogram representations of audio clips. The system achieves high accuracy, effectively detecting voice cloning across different speakers and environments.
Developed in Python using Librosa, TensorFlow/Keras, and Sklearn, the system processes uploaded audio files and provides a prediction label (“Real Audio” or “Fake Audio”). Experimental results show strong performance, minimal false detections, and suitability for security, forensics, and media authentication tasks.
Keywords: Deepfake Audio, Voice Cloning, CNN-LSTM, Machine Learning, Speech Analysis, Fake Audio Detection, Mel-Spectrogram.
Downloads:
|
DOI:
10.17148/IJARCCE.2025.1411115
[1] Rohit Pravin Pawar, Prof. K.S.Bhave, Prof. Manoj V. Nikum, "Audio Deepfake Detection Using Machine Learning," International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2025.1411115