Abstract: A large number
of cloud services require users to share private data like electronic health
records for data analysis or mining, bringing privacy concerns. Unavowed data sets via generalization to satisfy certain privacy requirements
such as k-anonymity is a widely used category of privacy conserving
techniques. At present, the tensile of data in many cloud applications
increases tremendously in accordance with the big data trend, thereby making it
a challenge for frequently used software tools to capture, manage, and process
such vast-scale data within a tolerable pass by time. As a result, it is a
challenge for existing unavowed approaches to achieve privacy preservation on
privacy-sensitive large-scale data sets due to their insufficiency of
scalability. In this paper, we put forward a scalable two-phase top-down
specialization (tds) approach to anonymize large-scale data sets using the
mapreduce framework on cloud. In both phases of our start to deal with, we
consciously design a group of innovative mapreduce jobs to concretely
accomplish the specialization computation in a highly scalable way.
Keywords: Data anonymization, top-down specialization, mapreduce, cloud, privacy preservation