Abstract: This system fills the communication gap between the deaf and hearing population by interpreting oral language into Indian Sign Language (ISL) video streams, making it more accessible. It begins with Whisper ASR, a transformer-based automatic speech recognition algorithm that transcribes speech to accurate text. The transcribed text is then processed with Natural Language Processing (NLP) methods, such as tokenization, part-of-speech tagging, stop-word elimination, and lemmatization, to prepare the structure for ISL translation. For additional enhancement of compatibility with ISL grammar, a Sequence-to-Sequence (Seq2Seq) model with Recurrent Neural Networks (RNNs) is employed for restructuring sentences to produce fluent and natural translations. The optimized text is then translated into pre-recorded ISL video clips, and MoviePy performs the seamless stitching and synchronization of sign segments. A web interface based on Flask offers users a simple platform to upload audio files, input text, and create ISL videos in real-time. The system is optimized for efficiency and ease of use, and it can be useful in education, healthcare, government services, and customer support. The future developments will emphasis on developing the ISL vocabulary dataset, real-time processing optimization, mobile compatibility, and cloud platforms deployment for improved scalability. These enhancements will make the system more efficient, accurate, and accessible, further improving communication for the deaf and hard-of-hearing.
Keywords: Speech-to-Sign Language, Indian Sign Language (ISL), Seq2Seq Model, Natural Language Processing (NLP), Audio-to-Video Conversion, Gesture Recognition, Whisper ASR, Deep Learning.
|
DOI:
10.17148/IJARCCE.2025.14388