Abstract: The large datasets are being mined to take out hidden information and patterns that assist decision makers in making effective, well-organized and well-timed decisions in an ever increasing competitive world. This kind of knowledge-driven data mining activity is impossible without sharing the datasets between the owners of datasets and data mining experts or corporations. As a result, protecting ownership by embedding a watermark on the datasets is becoming applicable. The main challenge in watermarking to be mined datasets is, How to conserve knowledge in features or attributes? The owner needs to manually define Usability constraints for every type of dataset to protect the contained knowledge. The main contribution is a novel formal model that facilitates a data owner to describe usability constraints to preserve the knowledge contained in the dataset in an automated manner. The formal model aims at preserving “classification potential” of each one characteristic and other most main characteristics of datasets that participates in an important role during the mining process of data; as an end result, learning statistics and decision-making rules also remain unbroken. I will implementing a model and integrating it with a new watermark embedding algorithm to demonstrate that the inserted watermark not only conserve the information contained in the dataset, but also significantly increases watermark security as compared with existing systems.

Keywords: knowledge-preserving and ownership preserving data mining, Data usability, watermarking datasets, right protection.