Abstract: Data cleaning is a process that detects and removes the errors and inconsistencies in the data in order to improve the quality of the data. To have a high data quality, data quality problems has to be solved. Data quality problems exist in single and multiple source systems. A single source problem refers to the errors, inconsistencies, missing values, uniqueness violation, duplicated records and referential integrity violations. Multiple source problems are structural conflicts, naming conflicts, inconsistent timing and aggregating. In this paper, data quality problems such as duplication, missing values and attribute correction are solved by implementing different algorithm using data mining techniques.
 

Keywords: Data cleaning, Duplication, Missing data, Attribute correction, Levenshtein distance.


PDF | DOI: 10.17148/IJARCCE.2018.7612

Open chat
Chat with IJARCCE