← Back to VOLUME 15, ISSUE 5, MAY 2026
This work is licensed under a Creative Commons Attribution 4.0 International License.
AI-Powered Detection of Deepfake Audio in Hindi and Kannada Using Speech Analysis
Mrs. Kavitha K S, Chaitanya C Gowda, D Yashwanth, Dheeraj R, Lishanth N
👁 6 views📥 2 downloads
Abstract: The exponential growth of generative artificial intelligence has enabled the mass production of deepfake audio—synthetic speech crafted to replicate the vocal identity of real individuals. Such fabricated audio introduces severe threats to financial security, democratic discourse, biometric authentication, and the credibility of legal evidence. Despite extensive research in English-centric audio forensics, Indian regional languages, specifically Hindi and Kannada, remain substantially underrepresented in the literature. This paper presents a comprehensive survey of existing deepfake audio detection techniques, analyses critical research gaps pertaining to Indian regional languages, and proposes an AI-powered detection framework tailored to Hindi and Kannada speech. The proposed system employs a Convolutional Neural Network (CNN) and Transformer encoder hybrid to jointly model local spectral patterns and long-range temporal dependencies in audio signals. A custom multilingual dataset is constructed from real speech corpora supplemented with synthesized audio generated via Google TTS, Coqui TTS, and Bark. Acoustic features including Mel-Frequency Cepstral Coefficients (MFCC), mel-spectrograms, chroma, and prosodic descriptors are extracted using the Librosa toolkit. The model performs binary classification—Real versus Fake—with performance assessed through Accuracy, Equal Error Rate (EER), False Acceptance Rate (FAR), and False Rejection Rate (FRR). A real-time Flask/Streamlit web interface enables non-technical users to upload audio and receive instant detection results alongside a confidence score.
Keywords: Deepfake audio detection, Hindi speech, Kannada speech, CNN-Transformer, MFCC, mel-spectrogram, Indian language forensics, voice cloning, binary classification, EER
Keywords: Deepfake audio detection, Hindi speech, Kannada speech, CNN-Transformer, MFCC, mel-spectrogram, Indian language forensics, voice cloning, binary classification, EER
How to Cite:
[1] Mrs. Kavitha K S, Chaitanya C Gowda, D Yashwanth, Dheeraj R, Lishanth N, “AI-Powered Detection of Deepfake Audio in Hindi and Kannada Using Speech Analysis,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2026.155229
