Logo PTI Logo FedCSIS

Proceedings of the 18th Conference on Computer Science and Intelligence Systems

Annals of Computer Science and Information Systems, Volume 35

On Gower Similarity Coefficient and Missing Values

DOI: http://dx.doi.org/10.15439/2023F3575

Citation: Proceedings of the 18th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 35, pages 10351040 ()

Full text

Abstract. The Gower similarity coefficient is a popular measure for comparing objects with possibly mixed type attributes and missing values. One of its characteristics is that it calculates the coefficient value without considering attributes with missing values. In this article, we explore the properties of coefficient in detail, including the consequences of omitting attributes with missing values. We also introduce strict lower and upper bounds on the Gower similarity coefficient, derive a number of their properties and propose a solution to the identified problem with the Gower similarity coefficient.

References

  1. B. Ben Ali, Y. Massmoudi, “K-means clustering based on gower similarity coefficient: A comparative study,” 2013 5th International Conference on Modeling, Simulation and Applied Optimization, ICMSAO 2013. https://doi.org/10.1109/ICMSAO.2013.6552669.
  2. S. S. K J. Chae and W. Y. Yang, “Cluster analysis with balancing weight on mixed-type data,” The Korean Communications in Statistics, vol. 13, no. 3, 2006, pp. 719–732, http:\\http://dx.doi.org/10.5351/CKSS.2006.13.3.719.
  3. J. Fontecha, R. Hervás, and J. Bravo, “Mobile Services Infrastructure for Frailty Diagnosis Support based on Gower’s Similarity Coefficient and Treemaps,” Mobile Information Systems, vol. 10, Article ID 728315, 20 pages, 2014. https://doi.org/10.1155/2014/728315.
  4. J. C. Gower, “A General Coefficient of Similarity and Some of Its Properties, Biometrics, “ Vol. 27, No. 4. (Dec., 1971), pp. 857-871, https://doi.org/10.2307/2528823.
  5. S. Pavoine, J. Vallet, A.-B. Dufour, S. Gachet, and H. Daniel, “On the challenge of treating various types of variables: application for improving the measurement of functional diversity,” Oikos, 118(3) 2009, pp. 391-402, https://doi.org/10.1111/j.1600-0706.2008.16668.x.
  6. G. Philip and B. S. Ottaway, “Mixed data cluster analysis: an illustration using cypriot hooked-tang weapons, “ Archaeometry, vol. 25, no. 2, 1983, pp. 119–133, https://doi.org/10.1111/j.1475-4754.1983.tb00671.x.
  7. J. Podani and D. Schmera: "Generalizing resemblance coefficients to accommodate incomplete data," Ecological Informatics 66 (2021) 101473, https://doi.org/10.1016/j.ecoinf.2021.101473
  8. G. Tuerhong and S. B. Kim, “Gower distance-based multivariate control charts for a mixture of continuous and categorical variables,” Expert Systems with Applications, 41(4 PART 2), 2014, pp. 1701–1707, https://doi.org/10.1016/j.eswa.2013.08.068.