Abstract: With the present era of computer technology, the ability to successfully transform numerous inputs into editable and usable text is increasingly essential. This "Multi-Mode Text Converter" project aims to bridge the gap between digital and non-digital inputs and digital documentation through an easy, accessible, and multi-purpose web-based program. Streamlit is used to develop the app, which comprises two primary functionalities: Image to Text conversion using Optical Character Recognition (OCR) and Voice to Text transcription via Speech Recognition. The Image to Text (OCR) feature permits users to import printed or cursive images and derive text content from them by employing Tesseract OCR, supported with Tamil and English languages. Preprocessing processes such as thresholding and grayscale conversion are utilized to enhance text recognition accuracy and improve image quality. For English texts, TextBlob is also employed by the app for automatic spell checking and correction for the generation of quality text output. Extracted or edited text may be exported in a.txt format for convenience to use in future purposes. The Voice to Text module utilizes the Google Speech Recognition API to transcribe live voice input captured from a microphone. It is possible to choose between English and Tamil speech recognition, and users enjoy regional inclusivity as well as support for multilinguality. Transcribed text is also storable for documentation and archival purposes. One of the key features of the application is the Text-to- Speech (TTS) feature, powered by gTTS (Google Text-to-Speech), through which users can listen to the recorded or typed text in their chosen language. For improved user experience, the application uses an appealing graphical user interface with a customized background and adaptive layout. By integrating image processing, natural language editing, speech recognition, and voice synthesis, the Multi-Mode Text Converter is an end-to-end system for digital text creation and extraction. It also has its future applications in education, accessibility software, digital archiving, and simple-to- use data entry systems..
Keywords- Optical Character Recognition (OCR), Automatic Speech Recognition (ASR), Image to Text, Voice to Text, Tesseract OCR, TextBlob, gTTS, Speech Recognition API, Streamlit, multi-language support, text-to-speech, image preprocessing, digital text conversion, handwritten text recognition, audio transcription, user interface, AI integration.
|
DOI:
10.17148/IJARCCE.2025.144104