Abstract: Every day activity many number of data and file has been generated. Every organization wants to store and main that information with efficient manner. Efficient denotes, storage space utilization, securely maintaining the organization records, accessing records quickly.  On the organization server maintain the files some of them are stored for more than once. It will increase the utilization of memory. It will lead us to insufficient storage space. Sometimes outgoing data in the network connection contain the duplicate files so, network traffic is occurring (or) the transfer speed gets shrunk. So, here we propose a technique called data deduplication and we also use the algorithm of machine learning called string comparison to detect the redundant data and files. To ensure the security we also proposed hybrid authentication approach.

Keywords: Cloud Services, Deduplication, Greedy Approach, String Comparison

