← Back to VOLUME 15, ISSUE 4, APRIL 2026
This work is licensed under a Creative Commons Attribution 4.0 International License.
Multilingual AI-Based Voice-Controlled Robotic System Using Distributed Architecture
👁 9 views📥 1 download
Abstract: Human–robot interaction is evolving rapidly with the convergence of artificial intelligence, cloud computing, and embedded systems. This paper presents the design and implementation of Astra, a multilingual AI-
driven voice-controlled robotic system that operates through a distributed architecture. High-level natural language intelligence runs on a laptop while real-time motor control is managed by an ESP32 microcontroller. The system supports three languages—English, Hindi, and Telugu—enabling broad accessibility across India’s linguistically diverse population. Audio input is captured via a wired microphone and transcribed by Sarvam AI, a cloud-based speech recognition service optimized for Indian languages.
The transcribed text is forwarded to a GPT-4o-mini large language model via OpenRouter, which classifies the input as either a movement command or a general conversational query and generates a structured JSON response. Movement commands are transmitted from the laptop to the ESP32 over Wi-Fi using the HTTP protocol, while conversational answers are spoken aloud via gTTS. A soft wake-word mechanism (“Astra”) enhances usability without strict keyword dependency.
Experimental evaluation demonstrates an average speech recognition accuracy of 85 % across all three languages, end- to-end command latency under 2 s, and robust motor control with no packet loss over a local Wi-Fi
Keyword: Human–Robot Interaction, Multilingual Speech Recognition, Sarvam AI, GPT-4o-mini, ESP32, Distributed AI Architecture, Wi-Fi HTTP Control, Natural Language Understanding, Voice-Controlled Robot, IoT.
driven voice-controlled robotic system that operates through a distributed architecture. High-level natural language intelligence runs on a laptop while real-time motor control is managed by an ESP32 microcontroller. The system supports three languages—English, Hindi, and Telugu—enabling broad accessibility across India’s linguistically diverse population. Audio input is captured via a wired microphone and transcribed by Sarvam AI, a cloud-based speech recognition service optimized for Indian languages.
The transcribed text is forwarded to a GPT-4o-mini large language model via OpenRouter, which classifies the input as either a movement command or a general conversational query and generates a structured JSON response. Movement commands are transmitted from the laptop to the ESP32 over Wi-Fi using the HTTP protocol, while conversational answers are spoken aloud via gTTS. A soft wake-word mechanism (“Astra”) enhances usability without strict keyword dependency.
Experimental evaluation demonstrates an average speech recognition accuracy of 85 % across all three languages, end- to-end command latency under 2 s, and robust motor control with no packet loss over a local Wi-Fi
Keyword: Human–Robot Interaction, Multilingual Speech Recognition, Sarvam AI, GPT-4o-mini, ESP32, Distributed AI Architecture, Wi-Fi HTTP Control, Natural Language Understanding, Voice-Controlled Robot, IoT.
How to Cite:
[1] Mrs. V. Divya Vani, Dr. G. Anand Kumar, K. Dharan, Ch. Tharun, “Multilingual AI-Based Voice-Controlled Robotic System Using Distributed Architecture,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2026.154126
