Abstract: Clustering is a way of finding the structures from a collection of unlabeled gene expression data. A number of algorithms are developed to tackle the problem of clustering the gene expression data. It is important for solving the problems that originate due to unsupervised learning. This paper presents a performance analysis on various clustering algorithm namely K-means, expectation maximization, and density based clustering in order to identify the best clustering algorithm for microarray data. Sum of squared error, log likelihood measures are used to evaluate the performance of these clustering methods.

Keywords: Clustering analysis on microarray data, comparison of clustering algorithms, clustering analysis on gene expression data, literature review on clustering methods, survey on clustering techniques.