Abstract: The web contains a huge amount of data and this data results into a large amount of information. The information on the web is in two forms i) Structured data and ii) Unstructured data. In this paper, we focus on structured data. List data is one of the most important source of structured data for information retrieval on the web. This paper deals with the “Top-k Lists”, web pages that describe a list of k instances of a particular topic or concept. Examples are, “the 5 top most cars in the world”, “the 10 richest businessman in the world” etc. Top-k lists are a richer, larger and of high quality source of information. Therefore, top-k lists are highly valuable. This paper reviews a various traditional methods for extracting the top-k lists. After studying these, we present an efficient method that obtains the target lists from web pages with high accuracy. Extraction of such lists can help enrich existing knowledge bases about general concepts and useful as a preprocessing step to produce facts for a fact answering engine.

Keywords: Web information extraction, Top-k lists, List extraction, Web mining.