Abstract: Despite rapid progress in artificial intelligence, communication between hearing and non-hearing individuals still faces significant challenges. This research proposes a real- time sign language translation system that converts hand gestures into readable text using a hybrid YOLOv5-Mediapipe-PyTorch architecture. The framework leverages NVIDIA CUDA for accelerated inference and OpenCV for image preprocessing and display. The convolutional model is trained through transfer learning on a curated Indian Sign Language (ISL) dataset containing 26 alphabetic and multiple word-level gestures. The developed system achieves 96.2 % accuracy and functions in real time on standard GPU hardware. Translated text is rendered as live subtitles via OBS virtual camera, enabling accessibility on conferencing platforms such as Google Meet and Microsoft Teams.
Experimental evaluation confirms that the YOLOv5–Mediapipe hybrid substantially reduces latency while maintaining high precision. This work demonstrates a scalable path toward
inclusive communication technology bridging the gap between hearing-impaired and non- sign-language users.

Keywords: Sign Language Recognition, Deep Learning, YOLOv5, Mediapipe, CUDA, PyTorch, Real-Time Translation.


Downloads: PDF | DOI: 10.17148/IJARCCE.2025.141045

How to Cite:

[1] Prof. Diksha Bansod, Vinit Pawankar, Sumit Ghoshal, Riya Patel, Himanshu Dhande, Shubham Jadhao, "A Real-Time Deep Learning-Based Sign Language Translator to Text Using YOLOv5 and Mediapipe," International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2025.141045

Open chat
Chat with IJARCCE