Analysis of the Public Health Service in Bogotá, Colombia: a Study Based on Customer's Complains and Using Unsupervised Learning Algorithms
Sebastian Quinchia-Lobo, Daniela Salazar-Gonzalez, Daniel Salas-Álvarez, Rubén Baena-Navarro, Isaac Caicedo-Castro
DOI: http://dx.doi.org/10.15439/2023F8583
Citation: Position Papers of the 18th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 36, pages 87–95 (2023)
Abstract. In this study, our aim is to analyze the public health services in the city of Bogota, Colombia. We used unsupervised learning algorithms for clustering requests, complaints, claims, and denunciations issued to Supersalud in 2021. We collected the data from Supersalud's databases. We adopted clustering algorithms such as K-Means, Bisecting K-Means, and Gaussian Mixture, thus, we evaluated the quality of the combination using the silhouette coefficient. The algorithm with the best clustering quality to generate the clusters has been improved. Of the eight clusters, the first two present the highest incidences, with 181 and 249 affiliates affected for every 2,000 in the year 2021. In the first cluster, with 55\% support and 100\% confidence, a strong association was found between problems related to medical care facilities and restricted access to health services. In addition, in these two clusters RCCD with pathologies such as chronic communicable and non-communicable diseases (respiratory, diabetes, renal, risk factors and cardiovascular) associated with restricted access to health services were found. In conclusion, The unsupervised grouping allowed to analyze the public health services from the perspective of the RCCD, providing valuable information on the experiences of the users and the challenges in the provision of health services in Bogot\'a, these findings demonstrate the restriction in the access to health services from different perspectives of a deficient state regarding the provision of health services in the city of Bogot\'a, Colombia.
References
- D. Mendieta and G. Rojas, “Corruption the biggest epidemic that colombia suffers,” Revista Opiniao Juridica, vol. 19, no. 32, pp. 296–315, Sep. 2021. [Online]. Available: https://periodicos.unichristus.edu.br/opiniaojuridica/article/view/3979
- S. C. Johnson, M. Cunningham, I. N. Dippenaar et al., “Public health utility of cause of death data: applying empirical algorithms to improve data quality,” BMC Medical Informatics and Decision Making, vol. 21, no. 175, p. 20, Dec. 2021. [Online]. Available: https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-021-01501-1
- J.-S. Franco and D. Vizcaya, “Availability of secondary healthcare data for conducting pharmacoepidemiology studies in Colombia: A systematic review,” Pharmacology Research & Perspectives, vol. 8, Oct. 2020. [Online]. Available: https://onlinelibrary.wiley.com/doi/10.1002/prp2.661
- Secretarı́a Distrital de Salud de Bogotá D.C., “Plan Territorial de Salud para Bogotá D.C. 2020-2024,” Secretarı́a Distrital de Salud de Bogotá D.C., Bogotá D.C., Tech. Rep., 2020. [Online]. Available: https://subredsuroccidente.gov.co/planeacion/DOCUMENTO%20PTS%202020-2024%20%2027042020.pdf
- S. Quinchia-Lobo and D. Salazar-González, “Análisis exploratorio de las pqrd del sector salud mediante aprendizaje no supervisado para identificar las principales barreras y oportunidades de mejora en la prestación del servicio en la salud pública del municipio de monterı́a,” B.Sc. thesis, Universidad de Córdoba, Monteria, Córdoba, Jul. 2023, supervisors: Salas-Alvarez D. and Baena-Navarro R. [Online]. Available: https://repositorio.unicordoba.edu.co/handle/ucordoba/7408
- SUPERSALUD. Base de datos pqrd 2021 - csv — portal de datos abiertos de la sns. [Online]. Available: https://mapas.supersalud.gov.co/arcgisportal/apps/sites/#/datos-abiertos/datasets/3824e636c1b748269364c0e57c680d58/about
- M. Hinojosa, I. Derpich, M. Alfaro et al., “Procedimiento de agrupación de estudiantes según riesgo de abandono para mejorar la gestión estudiantil en educación superior,” Texto Livre, vol. 15, p. 22, Mar. 2022. [Online]. Available: https://periodicos.ufmg.br/index.php/textolivre/article/view/37275
- R. Hernández Sampieri, C. Fernández Collado, and P. Baptista Lucio, Metodologı́a de la investigación, 5th ed. México, D.F: McGraw-Hill, 2010.
- C. Schröer, F. Kruse, and J. M. Gómez, “A Systematic Literature Review on Applying CRISP-DM Process Model,” Procedia Computer Science, vol. 181, pp. 526–534, 2021. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S1877050921002416
- E. Nazari, M. H. Shahriari, and H. Tabesh, “BigData Analysis in Healthcare: Apache Hadoop , Apache spark and Apache Flink,” Frontiers in Health Informatics, vol. 8, no. 1, pp. 92–101, Jul. 2019. [Online]. Available: http://ijmi.ir/index.php/IJMI/article/view/180
- ADRES. (2022) Reporte de afiliados por departamento y municipio. [Online]. Available: https://www.adres.gov.co/eps/bdua/Paginas/reporte-afiliados-por-departamento-y-municipio.aspx
- M. K. Dahouda and I. Joe, “A Deep-Learned Embedding Technique for Categorical Features Encoding,” IEEE Access, vol. 4, p. 12, 2016.
- W. Bao, N. Lianju, and K. Yue, “Integration of unsupervised and supervised machine learning algorithms for credit risk assessment,” Expert Systems with Applications, vol. 128, pp. 301–315, Aug. 2019. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0957417419301472
- U. N. Wisesty and T. R. Mengko, “Comparison of dimensionality reduction and clustering methods for sars-cov-2 genome,” Bulletin of Electrical Engineering and Informatics, vol. 10, no. 4, pp. 2170–2180, 2021. [Online]. Available: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85111115089&doi=10.11591%2fEEI.V10I4.2803&partnerID=40&md5=9f4a6b2b087f1e835402560d8081947c
- S. Jian, D. Li, and Y. Yu, “Research on Taxi Operation Characteristics by Improved DBSCAN Density Clustering Algorithm and K-means Clustering Algorithm,” Journal of Physics: Conference Series, vol. 1952, no. 4, p. 7, Jun. 2021. [Online]. Available: https://iopscience. iop.org/article/10.1088/1742-6596/1952/4/042103
- K. P. Sinaga and M.-S. Yang, “Unsupervised K-Means Clustering Algorithm,” IEEE Access, vol. 8, pp. 80 716–80 727, 2020. [Online]. Available: https://ieeexplore.ieee.org/document/9072123/
- Apache Software Foundation. Kmeans — pyspark 3.3.2 documentation. [Online]. Available: https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml. clustering.KMeans.html#pyspark.ml.clustering.KMeans
- B. Bahmani, B. Moseley, A. Vattani et al., “Scalable k-means++,” Proceedings of the VLDB Endowment, vol. 5, no. 7, pp. 622–633, Mar. 2012. [Online]. Available: https://dl.acm.org/doi/10.14778/2180912.2180915
- Apache Software Foundation. Bisectingkmeans — pyspark 3.3.2 documentation. [Online]. Available: https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.clustering.BisectingKMeans.html#pyspark.ml.clustering.BisectingKMeans
- M. Steinbach, G. Karypis, and V. Kumar, “A comparison of document clustering techniques,” KDD Workshop on Text Mining, 2000.
- M. Vichi, C. Cavicchia, and P. J. F. Groenen, “Hierarchical Means Clustering,” Journal of Classification, vol. 39, no. 3, pp. 553–577, Nov. 2022. [Online]. Available: https://link.springer.com/10.1007/s00357-022-09419-7
- A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum Likelihood from Incomplete Data Via the EM Algorithm,” Journal of the Royal Statistical Society: Series B, vol. 39, no. 1, pp. 1–22, Sep. 1977. [Online]. Available: https://onlinelibrary.wiley.com/doi/10.1111/j.517-6161.1977.tb01600.x
- Apache Software Foundation. Gaussianmixture — pyspark 3.3.2 documentation. [Online]. Available: https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.clustering.GaussianMixture.html#pyspark.ml.clustering.GaussianMixture
- K. Aziz, D. Zaidouni, and M. Bellafkih, “Leveraging resource management for efficient performance of Apache Spark,” Journal of Big Data, vol. 6, p. 23, Dec. 2019. [Online]. Available: https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0240-1
- P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, vol. 20, pp. 53–65, Nov. 1987. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/0377042787901257
- K. R. Shahapure and C. Nicholas, “Cluster Quality Analysis Using Silhouette Score,” in 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA). sydney, Australia: IEEE, Oct. 2020, pp. 747–748. [Online]. Available: https://ieeexplore.ieee.org/document/9260048/
- A. Fajardo-Gutiérrez, “Medición en epidemiologı́a: prevalencia, incidencia, riesgo, medidas de impacto,” Revista Alergia México, vol. 64, no. 1, pp. 109–120, Feb. 2017. [Online]. Available: http://revistaalergia.mx/ojs/index.php/ram/article/view/252
- L. Rychetnik, P. Hawe, E. Waters et al., “A glossary for evidence based public health,” Journal of Epidemiology & Community Health, vol. 58, pp. 538–545, 2004. [Online]. Available: https://jech.bmj.com/lookup/doi/10.1136/jech.2003.011585
- D. T. Larose and C. D. Larose, Discovering Knowledge in Data: An Introduction to Data Mining. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2014. [Online]. Available: http://doi.wiley.com/10.1002/9781118874059
- O. M. Al-Quteimat and A. M. Amer, “The Impact of the COVID-19 Pandemic on Cancer Patients,” American Journal of Clinical Oncology, vol. 43, no. 6, pp. 452–455, Jun. 2020. [Online]. Available: https://journals.lww.com/10.1097/COC.0000000000000712
- L. Bran Piedrahita, A. Valencia Arias, L. Palacios Moya et al., “Barreras de acceso del sistema de salud colombiano en zonas rurales: percepciones de usuarios del régimen subsidiado,” Hacia la Promoción de la Salud, vol. 25, no. 2, pp. 29–38, Jul. 2020. [Online]. Available: https://revistasojs.ucaldas.edu.co/index.php/hacialapromociondelasalud/article/view/2358