Abstract: In this article we present a detailed survey on K-Means clustering algorithms for huge datasets like high dimensional dataset etc. Our study gives an overview of different clustering scheme of data mining. The k-means algorithm and its variations are very well recognized to be speedy clustering algorithms. Nevertheless, they are susceptible to the choice of initial points and are unproductive for solving clustering troubles in very large datasets. Currently, incremental schemes have been developed to determine difficulties with the choice of initial points. The global k-means (GKM) and the fast global k-means (FGKM) algorithms are based on such a scheme. They iteratively append one cluster center at a time. Numerical experiments show that these algorithms considerably improve the k-means algorithm. Nevertheless, they require buzzing the whole affinity matrix or computing affinity matrix on all steps of algorithms. This creates both algorithms time consuming and memory demanding for clustering even moderately large datasets. We give comparative study of different k means algorithms to understand them effectively.
Keywords: High Dimensional dataset, clustering, K means, GKM.