Abstract: Text and strings in images will be used to provide more informations. Extracting text directly from natural scene images or videos is a challenging task because of diverse text patterns and variant background interferences. Text recognition from natural images can be made using discriminative character descriptor and character structure. But there is a chance of false recognition and low text accuracy. In this paper, the accuracy rate of text detection and adding lexicon analysis is done to extend our system to word-level recognition in natural videos. To improve the accuracy and practicality of scene text extraction, designing more representative and discriminative features to model text structure will be made. This can be achieved by collecting a database of specific scene text words as stronger training set, for example, a set of word patches “EXIT” or “SALE” cropped from scene images. In addition, we will combine scene text extraction with other techniques like content-based image retrieval to develop more useful vision based assistant system.

Keywords: Scene text detection, scene text recognition, mobile application, character descriptor, text understanding, text retrieval.