Logo PTI Logo FedCSIS

Proceedings of the 16th Conference on Computer Science and Intelligence Systems

Annals of Computer Science and Information Systems, Volume 25

A Novel Cluster Ensemble based on a Single Clustering Algorithm

, , ,

DOI: http://dx.doi.org/10.15439/2021F28

Citation: Proceedings of the 16th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 25, pages 127135 ()

Full text

Abstract. In recent years, several cluster ensemble methods have been developed, but they still have some limitations. They often use different clustering algorithms in both stages of the clustering ensemble method, such as the ensemble generation step and the consensus function, resulting in a compatibility issues. To deal with it, we propose a novel cluster ensemble method based on an identical clustering algorithm (CEI). Experiments on real-world datasets from various sources show that CEI improves accuracy by 5\% on average compared to state-of-the-art cluster ensemble methods and by 55.54\% compared to AP while consuming 44.60\% less execution time.

References

  1. Chang-Dong Wang, Jian-Huang Lai, and S Yu Philip. Multi-view clustering based on belief propagation. IEEE Transactions on Knowledge and Data Engineering, 28(4):1007–1021, 2015. http://dx.doi.org/10.1109/TKDE. 2015.2503743.
  2. Cosmin Marian Poteraş, Marian Cristian Mihăescu, and Mihai Mocanu. An optimized version of the k-means clustering algorithm. In 2014 Federated Conference on Computer Science and Information Systems, pages 695–699, 2014. http://dx.doi.org/10.15439/2014F258.
  3. Cosmin M. Poteraş and Mihai L. Mocanu. Evaluation of an optimized k-means algorithm based on real data. In 2016 Federated Conference on Computer Science and Information Systems (FedCSIS), pages 831–835, 2016. http://dx.doi.org/10.15439/2016F231.
  4. Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000. http://dx.doi.org/10.1109/34.868688.
  5. Dimitrios Rafailidis and Petros Daras. The tfc model: Tensor factorization and tag clustering for item recommendation in social tagging systems. IEEE Transactions on Systems, Man and Cybernetics:Systems, 43(3):673–688, 2012. http://dx.doi.org/10.1109/TSMCA.2012.2208186.
  6. Dnyanesh G Rajpathak and Satnam Singh. An ontology-based text mining method to develop d-matrix from unstructured text. IEEE Transactions on Systems, Man and Cybernetics: Systems, 44(7):966–977, 2013. http://dx.doi.org/10.1109/TSMC.2013.2281963.
  7. Feiping Nie, Shaojun Shi, and Xuelong Li. Auto-weighted multi-view co-clustering via fast matrix factorization. Pattern Recognition, 102:107207, 2020. http://dx.doi.org/10.1016/j.patcog.2020.107207.
  8. Anil K Jain. Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8):651–666, 2010. http://dx.doi.org/10.1016/j.patrec.2009.09.011.
  9. Ana LN Fred and Anil K Jain. Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6):835–850, 2005. http://dx.doi.org/10.1109/TPAMI.2005. 113.
  10. Pan Su, Changjing Shang, and Qiang Shen. A hierarchical fuzzy cluster ensemble approach and its application to big data clustering. Journal of Intelligent & Fuzzy Systems, 28(6):2409–2421, 2015. http://dx.doi.org/10.3233/IFS-141518.
  11. M. Yousefnezhad and D. Zhang. Weighted spectral cluster ensemble. In 2015 IEEE International Conference on Data Mining, pages 549–558, Nov 2015.
  12. Tahani Alqurashi and Wenjia Wang. Clustering ensemble method. International Journal of Machine Learning and Cybernetics, 10(6):1227–1246, 2019. http://dx.doi.org/10.1007/s13042-017-0756-7.
  13. Alexander Topchy, Anil K Jain, and William Punch. Clustering ensembles: Models of consensus and weak partitions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(12):1866–1881, 2005. http://dx.doi.org/10.1109/TPAMI.2005.237.
  14. Zhiwen Yu, Xianjun Zhu, Hau-San Wong, Jane You, Jun Zhang, and Guoqiang Han. Distribution-based cluster structure selection. IEEE Transactions on Cybernetics, 47(11):3554–3567, 2016. http://dx.doi.org/10.1109/TCYB.2016.2569529.
  15. Brendan J Frey and Delbert Dueck. Clustering by passing messages between data points. Science, 315(5814):972–976, 2007. http://dx.doi.org/10.1126/science.1136800.
  16. Kiri Wagstaff and Claire Cardie. Clustering with instance-level constraints. AAAI/IAAI, 1097:577–584, 2000. http://dx.doi.org/10.5555/645529.658275.
  17. Alexander Strehl and Joydeep Ghosh. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3(Dec):583–617, 2002. http://dx.doi.org/10.1162/153244303321897735.
  18. D. Huang, C. Wang, H. Peng, J. Lai, and C. Kwoh. Enhanced ensemble clustering via fast propagation of cluster-wise similarities. IEEE Transactions on Systems, Man and Cybernetics: Systems, pages 1–13, 2018. 10.1109/TSMC.2018.2876202.
  19. Siavash Haghtalab, Petros Xanthopoulos, and Kaveh Madani. A robust unsupervised consensus control chart pattern recognition framework. Expert Systems With Applications, 42(19):6767–6776, 2015. http://dx.doi.org/10.1016/j.eswa.2015.04.069.
  20. Emmanuel Ramasso, Vincent Placet, and Mohamed Lamine Boubakar. Unsupervised consensus clustering of acoustic emission time-series for robust damage sequence estimation in composites. IEEE Transactions on Instrumentation and Measurement, 64(12):3297–3307. http://dx.doi.org/10.1109/TIM.2015.2450354.
  21. Tossapon Boongoen and Natthakan Iam-On. Cluster ensembles: A survey of approaches with recent extensions and applications. Computer Science and Review, 28:1–25, 2018. http://dx.doi.org/10.1016/j.cosrev.2018.01.003.
  22. Ashraf Mohammed Iqbal, Abidalrahman Moh’d, and Zahoor Khan. Semi-supervised clustering ensemble by voting. arXiv preprint https://arxiv.org/abs/1208.4138, 2012.
  23. Teh Ying Wah Ali Seyed Shirkhorshidi, S. Aghabozorgi and Andrew R. Dalby. A comparison study on similarity and dissimilarity measures in clustering continuous data. PLOS One, 10:1–20, 2015. http://dx.doi.org/10.1371/journal.pone.0144059.
  24. Inmar Givoni and Brendan Frey. Semi-supervised affinity propagation with instance-level constraints. In Artificial Intelligence and Statistics, pages 161–168. doi: 10.1.1.158.678.
  25. Jie Hu, Tianrui Li, Hongjun Wang, and Hamido Fujita. Hierarchical cluster ensemble model based on knowledge granulation. Knowledge-Based Sytems, 91:179–188, 2016. http://dx.doi.org/10.1016/j.knosys.2015.10.006.
  26. Hongjun Wang, Hanhuai Shan, and Arindam Banerjee. Bayesian cluster ensembles. Statistical Analysis and Data Mining: The ASA Data Science Journal, 4(1):54–70, 2011. http://dx.doi.org/10.1002/sam.10098.
  27. Hongjun Wang, Jianhuai Qi, Weifan Zheng, and Mingwen Wang. Semisupervised cluster ensemble based on binary similarity matrix. In 2010 2nd IEEE International Conference on Information Management and Engineering, pages 251–254. IEEE, 2010. http://dx.doi.org/10.1109/ICIME.2010.5478054.
  28. Bo Liu, Hong-Jun Wang, Yan Yang, and Xiao-Chun Wang. The method of cluster ensemble based on minimum redundancy feature subset. In Proceedings of the 2012 International Conference on Electronics, Communications and Control, pages 2320–2323. IEEE Computer Society, 2012. http://dx.doi.org/10.5555/2417502.2418206.
  29. Zhi-Hua Zhou and Wei Tang. Clusterer ensemble. Knowlwdge-Based Ssytems, 19(1):77–83, 2006. http://dx.doi.org/10.1016/j.knosys.2005.11.003.
  30. Hao Li, Meng Wang, and Xian-Sheng Hua. Msra-mm 2.0: A large-scale web multimedia dataset. In 2009 IEEE International Conference on Data Mining Workshops, pages 164–169. IEEE, 2009. http://dx.doi.org/10.1109/ ICDMW.2009.46.
  31. Emrah Hancer. A new multi-objective differential evolution approach for simultaneous clustering and feature selection. Engineering Application of Artificial Intelligence, 87:103307, 2020. http://dx.doi.org/10.1016/j.engappai.2019.103307.
  32. Alex Rodriguez and Alessandro Laio. Clustering by fast search and find of density peaks. Science, 344(6191):1492–1496, 2014. http://dx.doi.org/10.1126/science.1242072.