Optimized Clustering Technique for High Dimensional Dataset

BAPUSAHEB B. BHUSARE; DIPALI G. MOGAL Student; Dept. of Computer Science & Engineering; Govt. College of Engineering; Aurangabad (MH); India Student; Dept. of Computer Engineering; SSBT College of Engg. & Tech.; North Maharashtra University; Jalgaon (MH); India

← Back to VOLUME 2, ISSUE 4, APRIL 2013

Optimized Clustering Technique for High Dimensional Dataset

BAPUSAHEB B. BHUSARE, DIPALI G. MOGAL Student, Dept. of Computer Science & Engineering, Govt. College of Engineering, Aurangabad (MH), India Student, Dept. of Computer Engineering, SSBT College of Engg. & Tech., North Maharashtra University, Jalgaon (MH), India

Downloads: Download PDF

👁 50 views📥 1 download

Abstract: In data mining domain, high-dimensional and correlated data sets are used frequently. Clustering approach is represented for the analysis of similarity between the information within the database of any dimension. In order to have an effective a similarity search on a high dimensional database where exists correlated data, have to improve or extend the conventional clustering methods with different approaches. The existing indexing approaches such as vector approximation has some drawbacks such as ignoring dependencies across dimensions. This results in sub optimality in results. Thus the objective of the system is to perform clustering with exact nearest neighbor search, less number of random Inputs and Outputs over several recently proposed indexes, low computational cost and scales well with dimensions and size of the data set by tightening the cluster-distance bounds, possibly by optimizing the clustering algorithm so as to optimize the cluster distance bounds using optimization techniques like Pillar algorithm. This paper includes a new mechanism for clustering the elements of high-resolution data in order to improve precision and reduce computation time. The system applies K-means clustering after optimized by Pillar Algorithm. The Pillar algorithm considers the pillars placement which should be located as far as possible from each other to withstand against the pressure distribution of a roof, as identical to the number of centroids amongst the data distribution. This algorithm is able to optimize the K-means clustering in aspects of precision and computation time. It designates the initial centroids positions by calculating the accumulated distance metric between each data point and all previous centroids, and then selects data points which have the maximum distance as new initial centroids. This algorithm distributes all initial centroids according to the maximum accumulated distance metric.

Keywords: Multimedia database, Similarity Search, Clustering and KNN Search.

How to Cite:

[1] BAPUSAHEB B. BHUSARE, DIPALI G. MOGAL Student, Dept. of Computer Science & Engineering, Govt. College of Engineering, Aurangabad (MH), India Student, Dept. of Computer Engineering, SSBT College of Engg. & Tech., North Maharashtra University, Jalgaon (MH), India, “Optimized Clustering Technique for High Dimensional Dataset,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE)

This work is licensed under a Creative Commons Attribution 4.0 International License.