Data Clustering with Grasshopper Optimization Algorithm

Szymon Łukasik; Piotr Andrzej Kowalski; Małgorzata Charytanowicz; Piotr Kulczycki

Data Clustering with Grasshopper Optimization Algorithm

Szymon Łukasik, Piotr Andrzej Kowalski, Małgorzata Charytanowicz, Piotr Kulczycki

DOI: http://dx.doi.org/10.15439/2017F340

Citation: Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 11, pages 71–74 (2017)

Full text

Abstract. Dividing dataset into disjoint groups of homogeneous structure, known as data clustering, constitutes an important problem of data analysis. It can be solved with broad range of methods employing statistical approaches or heuristic procedures. The latter often include mechanisms known from nature as they are known to serve as useful components of effective optimizers. The paper investigates the possibility of using novel nature-inspired technique -- Grasshopper Optimization Algorithm (GOA) -- to generate accurate data clusterings. As a quality measure of produced solutions internal clustering validation measure of Calinski-Harabasz index is being employed. Paper provides description of proposed algorithm along with its experimental evaluation for a set of benchmark instances. Over a course of our study it was established that clustering based on GOA is characterized by high accuracy -- when compared with standard K-means procedure.

References

“Evolutionary computation bestiary,” http://conclave.cs.tsukuba.ac.jp/research/bestiary/, accessed May 06 2017.
K. Sörensen, “Metaheuristics - the metaphor exposed,” International Transactions in Operational Research, vol. 22, no. 1, pp. 3–18, 2015.
S. Saremi, S. Mirjalili, and A. Lewis, “Grasshopper optimisation algorithm: Theory and application,” Advances in Engineering Software, vol. 105, pp. 30 – 47, 2017.
T. Caliński and J. Harabasz, “A dendrite method for cluster analysis,” Communications in Statistics, vol. 3, no. 1, pp. 1–27, 1974.
W. J. Welch, “Algorithmic complexity: three np- hard problems in computational statistics,” Journal of Statistical Computation and Simulation, vol. 15, no. 1, pp. 17–25, 1982.
C.-W. Tsai, W.-C. Huang, and M.-C. Chiang, “Recent development of metaheuristics for clustering,” Lecture Notes in Electrical Engineering, vol. 274, pp. 629–636, 2014.
T. Niknam and B. Amiri, “An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis,” Applied Soft Computing, vol. 10, no. 1, pp. 183 – 197, 2010.
J. Senthilnath, S. Omkar, and V. Mani, “Clustering using firefly algorithm: Performance study,” Swarm and Evolutionary Computation, vol. 1, no. 3, pp. 164 – 171, 2011.
J. MacQueen, “Some methods for classification and analysis of multi-variate observations,” in Proc. 5th Berkeley Symp. Math. Stat. Probab., Univ. Calif. 1965/66, 1967, pp. 281–297.
S. Łukasik, P. A. Kowalski, M. Charytanowicz, and P. Kulczycki, “Clustering using flower pollination algorithm and calinski-harabasz index,” in 2016 IEEE Congress on Evolutionary Computation (CEC), July 2016, pp. 2724–2728.
P. A. Kowalski, S. Łukasik, M. Charytanowicz, and P. Kulczycki, “Clustering based on the krill herd algorithm with selected validity measures,” in 2016 Federated Conference on Computer Science and Information Systems (FedCSIS), Sept 2016, pp. 79–87.
M. Charytanowicz, J. Niewczas, P. Kulczycki, P. A. Kowalski, S. Łukasik, and S. Żak, “Complete gradient clustering algorithm for features analysis of X-Ray images,” in Information Technologies in Biomedicine, ser. Advances in Intelligent and Soft Computing, E. Piętka and J. Kawa, Eds. Springer Berlin Heidelberg, 2010, vol. 69, pp. 15–24.
S. Łukasik, P. Kowalski, M. Charytanowicz, and P. Kulczycki, “Fuzzy models synthesis with kernel-density-based clustering algorithm,” in Fuzzy Systems and Knowledge Discovery, 2008. FSKD ’08. Fifth International Conference on, vol. 3, Oct 2008, pp. 449–453.
H. Müller and U. Hamm, “Stability of market segmentation with cluster analysis - a methodological approach,” Food Quality and Preference, vol. 34, pp. 70 – 78, 2014.
C. Aggarwal and C. Zhai, “A survey of text clustering algorithms,” in Mining Text Data, C. C. Aggarwal and C. Zhai, Eds. Springer US, 2012, pp. 77–128.
M. Halkidi, Y. Batistakis, and M. Vazirgiannis, “On clustering validation techniques,” Journal of Intelligent Information Systems, vol. 17, no. 2-3, pp. 107–145, 2001.
O. Arbelaitz, I. Gurrutxaga, J. Muguerza, J. M. Pérez, and I. Perona, “An extensive comparative study of cluster validity indices,” Pattern Recognition, vol. 46, no. 1, pp. 243 – 256, 2013.
H. Parvin, H. Alizadeh, and B. Minati, “Objective criteria for the evaluation of clustering methods,” Journal of the American Statistical Association, vol. 66, pp. 846–850, 1971.
“UCI machine learning repository,” http://archive.ics.uci.edu/ml/, accessed May 10 2017.
P. Fränti and O. Virmajoki, “Iterative shrinking method for clustering problems,” Pattern Recognition, vol. 39, no. 5, pp. 761 – 775, 2006.
A. Kaveh, Chaos Embedded Metaheuristic Algorithms. Cham: Springer International Publishing, 2014, pp. 369–391.