Abstract: Duplicate detection is a problem of serious substance in many applications, including customer relationship management, personal information management or data mining. Duplicate detection is method of detecting all cases of multiple illustration of same real world object. A representative example is customer relationship management, where a company loses money by sending multiple catalogs to the same person, who in turn is wound up lowering customer satisfaction. Another application is data mining, where correct input data is necessary to construct useful reports that form the basis of decision mechanisms.

Keywords: Duplicate Detection, Entity Resolution, Progressiveness, Pay-As-You-Go, Data cleaning, Map Reduce.