Clustering based on the Krill Herd Algorithm with Selected Validity Measures

Piotr Andrzej Kowalski, Szymon Łukasik, Małgorzata Charytanowicz, Piotr Kulczycki

DOI: http://dx.doi.org/10.15439/2016F295

Citation: Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 8, pages 79–87 (2016)

Full text

Abstract. This paper describes a new approach to metaheuristic-based data clustering by means of Krill Herd Algorithm (KHA). In this work, KHA is used to find centres of the cluster groups. Moreover, the number of clusters is set up at the beginning of the procedure, and during the subsequent iterations of the optimization algorithm, particular solutions are evaluated by selected validity criteria. The proposed clustering algorithm has been numerically verified using ten data sets taken from the UCI Machine Learning Repository. Additionally, all cases of clustering were compared with the most popular method of k-means, through the Rand Index being applied as a validity measure.

References

P. Kulczycki, M. Charytanowicz, P. A. Kowalski, and S. Lukasik, “The complete gradient clustering algorithm: properties in practical applications,” 2012.
P. A. Kowalski, S. Łukasik, M. Charytanowicz, and P. Kulczycki, “Datadriven fuzzy modeling and control with kernel density based clustering technique,” Polish Journal of Environmental Studies, vol. 17, pp. 83–87, 2008.
S. Łukasik, P. Kowalski, M. Charytanowicz, and P. Kulczycki, “Fuzzy models synthesis with kernel-density-based clustering algorithm,” in Fuzzy Systems and Knowledge Discovery, 2008. FSKD ’08. Fifth International Conference on, vol. 3, Oct 2008. http://dx.doi.org/10.1109/FSKD.2008.139 pp. 449–453.
S. Breschi and F. Malerba, “The geography of innovation and economic clustering: some introductory notes,” Industrial and corporate change, vol. 10, no. 4, pp. 817–833, 2001.
M. Charytanowicz, J. Niewczas, P. Kulczycki, P. A. Kowalski, S. Łukasik, and S. Żak, “Complete gradient clustering algorithm for features analysis of x-ray images,” in Information Technologies in Biomedicine, ser. Advances in Intelligent and Soft Computing, E. Piętka and J. Kawa, Eds. Springer Berlin Heidelberg, 2010, vol. 69, pp. 15–24. ISBN 978-3-642-13104-2. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-13105-9_2
L. Rokach and O. Maimon, “Clustering methods,” in Data Mining and Knowledge Discovery Handbook, O. Maimon and L. Rokach, Eds. Springer US, 2005, pp. 321–352. ISBN 978-0-387-24435-8. [Online]. Available: http://dx.doi.org/10.1007/0-387-25465-X_15
P. Langfelder, B. Zhang, and S. Horvath, “Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for r,” Bioinformatics, vol. 24, no. 5, pp. 719–720, 2008.
I. Davidson and S. Ravi, “Agglomerative hierarchical clustering with constraints: Theoretical and empirical results,” in Knowledge Discovery in Databases: PKDD 2005. Springer, 2005, pp. 59–70.
S. M. Savaresi, D. L. Boley, S. Bittanti, and G. Gazzaniga, “Cluster selection in divisive clustering algorithms.” in SDM. SIAM, 2002, pp. 299–314.
J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proc. 5th Berkeley Symp. Math. Stat. Probab., Univ. Calif. 1965/66, 1967, pp. 281–297.
M.-S. Yang, “A survey of fuzzy clustering,” Mathematical and Computer modelling, vol. 18, no. 11, pp. 1–16, 1993.
J. C. Bezdek, R. Ehrlich, and W. Full, “Fcm: The fuzzy c-means clustering algorithm,” Computers & Geosciences, vol. 10, no. 2, pp. 191–203, 1984.
P. Kulczycki and M. Charytanowicz, A Complete Gradient Clustering Algorithm. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 497–504. ISBN 978-3-642-23896-3. [Online]. Available: http: //dx.doi.org/10.1007/978-3-642-23896-3_61
C. Warnekar and G. Krishna, “A heuristic clustering algorithm using union of overlapping pattern-cells,” Pattern Recognition, vol. 11, no. 2, pp. 85 – 93, 1979. http://dx.doi.org/http://dx.doi.org/10.1016/0031-3203(79)90054-2. [Online]. Available: http://www.sciencedirect.com/science/article/pii/ 0031320379900542
C. C. Aggarwal and C. K. Reddy, Data clustering: algorithms and applications. CRC Press, 2013.
C.-W. Tsai, W.-C. Huang, and M.-C. Chiang, “Recent development of metaheuristics for clustering,” in Mobile, Ubiquitous, and Intelligent Computing, ser. Lecture Notes in Electrical Engineering, J. J. J. H. Park, H. Adeli, N. Park, and I. Woungang, Eds. Springer Berlin Heidelberg, 2014, vol. 274, pp. 629–636. ISBN 978-3-642-40674-4. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-40675-1_93
T. Niknam and B. Amiri, “An efficient hybrid approach based on pso, {ACO} and k-means for cluster analysis,” Applied Soft Computing, vol. 10, no. 1, pp. 183 – 197, 2010. http://dx.doi.org/http://dx.doi.org/10.1016/j.asoc.2009.07.001. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1568494609000854
W. J. Welch, “Algorithmic complexity: three np-hard problems in computational statistics,” Journal of Statistical Computation and Simulation, vol. 15, no. 1, pp. 17–25, 1982. http://dx.doi.org/10.1080/00949658208810560. [Online]. Available: http://dx.doi.org/10.1080/00949658208810560
A. H. Gandomi and A. H. Alavi, “Krill herd: A new bio-inspired optimization algorithm,” Communications in Nonlinear Science and Numerical Simulation, vol. 17, no. 12, pp. 4831–4845, 2012. http://dx.doi.org/10.1016/j.cnsns.2012.05.010.
H. Parvin, H. Alizadeh, and B. Minati, “Objective criteria for the evaluation of clustering methods,” Journal of the American Statistical Association, vol. 66, pp. 846â C“–850, 1971.
P. A. Kowalski and S. Łukasik, “Experimental study of selected parameters of the krill herd algorithm,” in Intelligent Systems'2014. Springer Science Business Media, 2015, pp. 473–485. [Online]. Available: http://dx.doi.org/10.1007/978-3-319-11313-5_42
G. P. Singh and A. Singh, “Comparative study of krill herd, firefly and cuckoo search algorithms for unimodal and multimodal optimization,” IJISA, vol. 6, no. 3, pp. 35–49, 2014. http://dx.doi.org/10.5815/ijisa.2014.03.04.
P. K. Adhvaryyu, P. K. Chattopadhyay, and A. Bhattacharjya, “Application of bio-inspired krill herd algorithm to combined heat and power economic dispatch,” in 2014 IEEE Innovative Smart Grid Technologies - Asia. IEEE, 2014. http://dx.doi.org/10.1109/isgt-asia.2014.6873814.
L. Guo, G.-G. Wang, A. H. Gandomi, A. H. Alavi, and H. Duan, “A new improved krill herd algorithm for global numerical optimization,” Neurocomputing, vol. 138, pp. 392–402, 2014. http://dx.doi.org/10.1016/j.neucom.2014.01.023.
G.-G. Wang, A. H. Gandomi, and A. H. Alavi, “Stud krill herd algorithm,” Neurocomputing, vol. 128, pp. 363–370, 2014. http://dx.doi.org/10.1016/j.neucom.2013.08.031.
G.-G. Wang, S. Deb, and S. M. Thampi, Intelligent Systems Technologies and Applications: Volume 1. Cham: Springer International Publishing, 2016, ch. A Discrete Krill Herd Method with Multilayer Coding Strategy for Flexible Job-Shop Scheduling Problem, pp. 201–215. ISBN 978-3-319-23036-8. [Online]. Available: http://dx.doi.org/10.1007/978-3-319-23036-8_18
A. Nowosielski, P. A. Kowalski, and P. Kulczycki, “Increasing the Speed of the Krill Herd Algorithm through Parallelization,” in Information Technology, Computational and Experimental Physics. AGH University of Science and Technology Press, 2016, pp. 117–120. ISBN 978-83-7464-838-7
A. Nowosielski, P. A. Kowalski, and P. Kulczycki, “The column-oriented database partitioning optimization based on the natural computing algorithms,” in 2015 Federated Conference on Computer Science and Information Systems, FedCSIS 2015, Łódź, Poland, September 13-16, 2015, 2015. http://dx.doi.org/10.15439/2015F262 pp. 1035–1041.
A. Mohammadi, M. S. Abadeh, and H. Keshavarz, “Breast cancer detection using a multi-objective binary krill herd algorithm,” in Biomedical Engineering (ICBME), 2014 21th Iranian Conference on, Nov 2014. http://dx.doi.org/10.1109/ICBME.2014.7043907 pp. 128–133.
R. R. Bulatović, G. Miodragović, and M. S. Bošković, “Modified krill herd (mkh) algorithm and its application in dimensional synthesis of a four-bar linkage,” Mechanism and Machine Theory, vol. 95, pp. 1–21, 2016. http://dx.doi.org/http://dx.doi.org/10.1016/j.mechmachtheory.2015.08.004. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0094114X15001895
P. Kowalski and S. Łukasik, “Training neural networks with krill herd algorithm,” Neural Processing Letters, 2015. http://dx.doi.org/10.1007/s11063-015-9463-0
E. Achtert, S. Goldhofer, H. P. Kriegel, E. Schubert, and A. Zimek, “Evaluation of clusterings – metrics and visual support,” in 2012 IEEE 28th International Conference on Data Engineering, April 2012. http://dx.doi.org/10.1109/ICDE.2012.128. ISSN 1063-6382 pp. 1285–1288.
T. Caliński and J. Harabasz, “A dendrite method for cluster analysis,” Communications in Statistics-theory and Methods, vol. 3, no. 1, pp. 1–27, 1974.
D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-1, no. 2, pp. 224–227, April 1979. http://dx.doi.org/10.1109/TPAMI.1979.4766909
L. Kaufman and P. J. Rousseeuw, Finding groups in data: an introduction to cluster analysis. John Wiley & Sons, 2009, vol. 344.
P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” Journal of computational and applied mathematics, vol. 20, pp. 53–65, 1987.
O. Arbelaitz, I. Gurrutxaga, J. Muguerza, J. M. Pérez, and I. Perona, “An extensive comparative study of cluster validity indices,” Pattern Recognition, vol. 46, no. 1, pp. 243–256, 2013.
J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” The Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006.
M. Lichman, “UCI machine learning repository,” 2013. [Online]. Available: http://archive.ics.uci.edu/ml
P. Fränti and O. Virmajoki, “Iterative shrinking method for clustering problems,” Pattern Recognition, vol. 39, no. 5, pp. 761 – 775, 2006. http://dx.doi.org/http://dx.doi.org/10.1016/j.patcog.2005.09.012. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0031320305003778
V. G. Sigillito, S. P. Wing, L. V. Hutton, and K. B. Baker, “Classification of radar returns from the ionosphere using neural networks,” Johns Hopkins APL Technical Digest, vol. 10, no. 3, pp. 262–266, 1989.
P. A. Kowalski and P. Kulczycki, “Interval probabilistic neural network,” Neural Computing and Applications, pp. 1–18, 2015. http://dx.doi.org/10.1007/s00521-015-2109-3.
M. Charytanowicz, J. Niewczas, P. Kulczycki, P. A. Kowalski, S. Lukasik, and S. Zak, “Complete gradient clustering algorithm for features analysis of x-ray images,” in Information Technologies in Biomedicine, ser. Advances in Intelligent and Soft Computing, E. Pietka and J. Kawa, Eds. Springer Berlin Heidelberg, 2010, vol. 69, pp. 15–24. ISBN 978-3-642-13104-2. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-13105-9_2
R. P. Gorman and T. J. Sejnowski, “Analysis of hidden units in a layered network trained to classify sonar targets,” Neural networks, vol. 1, no. 1, pp. 75–89, 1988.
J. R. Quinlan, “Induction of decision trees,” Machine learning, vol. 1, no. 1, pp. 81–106, 1986.
J. R. Quinlan, P. J. Compton, K. Horn, and L. Lazarus, “Inductive knowledge acquisition: a case study,” in Proceedings of the Second Australian Conference on Applications of expert systems. Addison-Wesley Longman Publishing Co., Inc., 1987, pp. 137–156.
R. Setiono and W. Leow, “Vehicle recognition using rule based methods,” Turing Institute Research Memorandum TIRM-87-018, vol. 121, 1987.
J. Zhang, “Selecting typical instances in instance-based learning,” in Proceedings of the ninth international conference on machine learning, 1992, pp. 470–479.
S. Aeberhard, D. Coomans, and O. De Vel, “Comparison of classifiers in high dimensional settings,” Dept. Math. Statist., James Cook Univ., North Queensland, Australia, Tech. Rep, no. 92-02, 1992.