Abstract: The World Wide Web has several online databases and the number keeps growing every day. The data in the web pages is generally wrapped in the form of data records. Such web pages are generated dynamically. This paper focuses on extracting the data from the web pages. Till today, several techniques are proposed for retrieving information from web pages but all suffer the common problem. The problem is dependence on programming language used to design the web pages. So this paper focuses on utilizing the visual features of web page for extracting the data from the deep web pages. To make the system efficient, it can be combined with non-visual information like the symbols and tags. Approach of this paper is independent of any specific web programming language and hence it can be extended to various web pages which have different underlying architecture.

 

Keywords: Web mining, Web data extraction, visual features of deep Web pages, wrapper generation