📞 +91-7667918914 | ✉️ ijarcce@gmail.com
IJARCCE Logo
International Journal of Advanced Research in Computer and Communication Engineering A monthly Peer-reviewed & Refereed journal
ISSN Online 2278-1021ISSN Print 2319-5940Since 2012
IJARCCE adheres to the suggestive parameters outlined by the University Grants Commission (UGC) for peer-reviewed journals, upholding high standards of research quality, ethical publishing, and academic excellence.
← Back to VOLUME 4, ISSUE 8, AUGUST 2015

Word-wise Script Identification of South Indian Document Images

Smita Biradar, V.S. Malemath, Suneel C Shinde

DOI: 10.17148/IJARCCE.2015.48103

Abstract: A document page may consist of text words, numerical in different regional script along with the English and/or National language.Especially the documents in multilingual country or in the border area may have this scenario to convey information at mass. The monolingual OCR fails to identify such other script words and hence script identification becomes essential in such cases. Script identification is one of the challenging steps in the Optical Character Recognition system for multi-script documents. In this work we propose a word-wise script identifier considering all the south Indian languages. The proposed method uses morphological features such as dilation and erosion and reconstruction as base and a nearest neighbor classifier is used to classify the script. The method showed robustness in the estimation of script when tested on 600 word document images.The overall accuracy is found to be 98.1%



Keywords: OCR, Script Identification, morphological reconstruction, multilingual documents, multi script documents, NN classifier.

How to Cite:

[1] Smita Biradar, V.S. Malemath, Suneel C Shinde, “Word-wise Script Identification of South Indian Document Images,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2015.48103