Abstract: Basically, OCR technology is applied to convert the printed Kannada text into machine-readable format. It will make the text extractible from a scanned document and a photograph so that Kannada literature will become easily digitalized and accessed. Our system will recognize words and characters in numerous typefaces and layouts including multi-column forms through complex algorithms and machine learning. The base of implementation is the Tesseract OCR engine that is excellent as far as recognition accuracy in texts is concerned, and well suited to the Kannada script. Experimental results reflect that our approach maintains the integrity of the original text without reducing human efforts in data entry. This paper supports the cause of preserving Kannada material in the regional language and its dissemination through this work. It adds up to the ever-increasing requirement for digital resources in these languages.
|
DOI:
10.17148/IJARCCE.2025.14130