Abstract: Sign language is a vital medium of communication for individuals with hearing and speech impairments, but the lack of knowledge among non-signers creates barriers. This project proposes a real-time sign language recognition system that can detect alphabets (A-Z) and numerics (0-9) from webcam video. The system combines modern deep learning techniques with web technologies to provide an accurate, fast, and user-friendly solution. The frontend uses WebRTC to capture video streams directly in a browser, making the system platform-independent and usable with any standard laptop or external camera. The backend uses FastAPI with WebSockets to enable real-time communication between the browser and the deep learning model, ensuring low-latency predictions.
The recognition model integrates EfficientNet (transfer learning) for feature extraction, combined with a CNN+RNN to capture spatial and temporal patterns. An attention mechanism enhances performance by focusing on the most informative frames, while a GRU classifier predicts the final alphabet or number with high accuracy. Training and validation are carried out using benchmark datasets along with self-collected samples to ensure adaptability in real-world conditions. The system prototype displays recognized signs as text beneath the video feed, with emphasis on accuracy, robustness, and real-time performance for applications in education, healthcare, and accessibility services.
Keywords: Sign Language Recognition, Real-Time Gesture Recognition, EfficientNet, CNN-GRU Hybrid Model, Attention Mechanism, Spatial-Temporal Feature Extraction, WebRTC, Low-Latency Inference
Downloads:
|
DOI:
10.17148/IJARCCE.2025.141175
[1] Prof. Minal Patil, Rhushabh Gaikwad, Rushikesh Ghogare, Shekhar Khandale, Roshan Avhad, "Deep Learning Based Real-Time Sign Language Recognition," International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2025.141175