An Optimized Version of the K-Means Clustering Algorithm

Cosmin Marian Poteraş, Cristian Mihăescu, Mihai Mocanu

DOI: http://dx.doi.org/10.15439/2014F258

Citation: Proceedings of the 2014 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 2, pages 695–699 (2014)

Full text

Abstract. This paper introduces an optimized version of the standard K-Means algorithm. The optimization refers to the running time and it comes from the observation that after a certain number of iterations, only a small part of the data elements change their cluster, so there is no need to re-distribute all data elements. Therefore the implementation proposed in this paper puts an edge between those data elements which won't change their cluster during the next iteration and those who might change it, reducing significantly the workload in case of very big data sets. The prototype implementation showed up to 70\% reduction of the running time.