📞 +91-7667918914 | ✉️ ijarcce@gmail.com
International Journal of Advanced Research in Computer and Communication Engineering
International Journal of Advanced Research in Computer and Communication Engineering A monthly Peer-reviewed & Refereed journal
ISSN Online 2278-1021ISSN Print 2319-5940Since 2012
IJARCCE adheres to the suggestive parameters outlined by the University Grants Commission (UGC) for peer-reviewed journals, upholding high standards of research quality, ethical publishing, and academic excellence.
← Back to VOLUME 15, ISSUE 5, MAY 2026

AI-Powered Detection of Deepfake Audio in Hindi and Kannada Using Speech Analysis

Mrs. Kavitha K S, Chaitanya C Gowda, D Yashwanth, Dheeraj R, Lishanth N

👁 6 views📥 2 downloads
Share: 𝕏 f in
Abstract: The exponential growth of generative artificial intelligence has enabled the mass production of deepfake audio—synthetic speech crafted to replicate the vocal identity of real individuals. Such fabricated audio introduces severe threats to financial security, democratic discourse, biometric authentication, and the credibility of legal evidence. Despite extensive research in English-centric audio forensics, Indian regional languages, specifically Hindi and Kannada, remain substantially underrepresented in the literature. This paper presents a comprehensive survey of existing deepfake audio detection techniques, analyses critical research gaps pertaining to Indian regional languages, and proposes an AI-powered detection framework tailored to Hindi and Kannada speech. The proposed system employs a Convolutional Neural Network (CNN) and Transformer encoder hybrid to jointly model local spectral patterns and long-range temporal dependencies in audio signals. A custom multilingual dataset is constructed from real speech corpora supplemented with synthesized audio generated via Google TTS, Coqui TTS, and Bark. Acoustic features including Mel-Frequency Cepstral Coefficients (MFCC), mel-spectrograms, chroma, and prosodic descriptors are extracted using the Librosa toolkit. The model performs binary classification—Real versus Fake—with performance assessed through Accuracy, Equal Error Rate (EER), False Acceptance Rate (FAR), and False Rejection Rate (FRR). A real-time Flask/Streamlit web interface enables non-technical users to upload audio and receive instant detection results alongside a confidence score.

Keywords: Deepfake audio detection, Hindi speech, Kannada speech, CNN-Transformer, MFCC, mel-spectrogram, Indian language forensics, voice cloning, binary classification, EER

How to Cite:

[1] Mrs. Kavitha K S, Chaitanya C Gowda, D Yashwanth, Dheeraj R, Lishanth N, “AI-Powered Detection of Deepfake Audio in Hindi and Kannada Using Speech Analysis,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2026.155229

Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License.