Abstract: Gesture recognition for voice synthesis is an emerging assistive technology that enables communication through hand gestures by converting them into synthesized speech. This system is especially beneficial for individuals with speech or hearing impairments, providing them an alternative medium for interaction. With advancements in computer vision and deep learning, gesture-based interfaces have become more accurate and efficient.
This paper presents a Gesture Recognition for Voice Synthesis System that uses computer vision and deep learning techniques to recognize predefined hand gestures in real time and convert them into corresponding voice outputs. A Convolutional Neural Network (CNN) is employed for gesture classification after preprocessing steps such as image resizing, background normalization, and feature extraction. Once a gesture is recognized, a text-to-speech module generates an appropriate voice output.
The system is implemented as a real-time application using a camera interface, allowing users to perform gestures naturally. Experimental evaluation shows high recognition accuracy and low response latency, demonstrating the effectiveness of the proposed system for assistive communication and human–computer interaction applications.

Keywords: Gesture Recognition, Voice Synthesis, Computer Vision, Deep Learning, CNN, Assistive Technology


Downloads: PDF | DOI: 10.17148/IJARCCE.2026.151138

How to Cite:

[1] BHARGAV K, SANDARSH GOWDA M M, "Gesture Recognition for Voice Synthesis," International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2026.151138

Open chat
Chat with IJARCCE