Abstract: Privacy has received increasing concerns in publication of datasets that contain sensitive information. Providing useful information to users for data mining in the main aspect and goals. Generalization and randomized response methods were proposed in database community to tackle this problem. Both the methods has faced the same barriers. These Generalization and randomized response methods usually required to control the tradeoff between privacy and data quality, which may put the data publishers in a dilemma. In these paper, a novel privacy preserving method for data publication is proposed based on conditional probability distribution and machine learning techniques, which can in act different criteria for different transactions. A basic cross sampling algorithm and a complete cross sampling algorithm are designed respectively for the settings of single sensitive attribute and multiple sensitive attributes, and an improved complete algorithm is developed by using Gibbs sampling, in order to enhance data utility when data are not sufficient. Many other methods provide better and strong privacy and better data utility.
Keywords: Data publication, Privacy preservation, Data utility, Cross sampling, Gibbs sampling
| DOI: 10.17148/IJARCCE.2020.9305