A novel ensemble learning technique of shallow models applied on a COVID-19 dataset
Diogen Babuc
DOI: http://dx.doi.org/10.15439/2024F8981
Citation: Proceedings of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS), M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 39, pages 537–542 (2024)
Abstract. Our lives were affected by the COVID-19 pandemic. In order to face this crisis, we provided a novel ensemble learning strategy to tackle the COVID-19 prediction and classification problems. Because of their capacity to handle the complex and varied nature of COVID-19 data, a range of shallow models, including K-Nearest Neighbors, Decision Trees, Support Vector Machines, Classification and Regression Trees, and Extreme Gradient Boost, are included in our method. Using a COVID-19 dataset, each model is trained independently and then ensemble learning techniques are used to integrate the predictions of the models. We use strict model validation and hyperparameter optimization to improve performance. Comparing our ensemble method to a single model or traditional ensemble techniques, our results show considerable improvements in classification performance and prediction accuracy.
References
- Ameer Sardar Kwekha-Rashid, Heamn N Abduljabbar, and Bilal Alhayani. Coronavirus disease (covid-19) cases analysis using machine-learning applications. Applied Nanoscience, 13(3), 2023. DOI: 10.1007/s13204-021-01868-7.
- Hafsa Bareen Syeda, Mahanazuddin Syed, Kevin Wayne Sexton, Shorabuddin Syed, Salma Begum, Farhanuddin Syed, Fred Prior, and Feliciano Yu Jr. Role of machine learning techniques to tackle the covid-19 crisis: systematic review. JMIR medical informatics, 9(1):e23811, 2021. http://dx.doi.org/10.2196/23811.
- Sara Platto, Tongtong Xue, and Ernesto Carafoli. Covid19: an announced pandemic. Cell Death & Disease, 11(9):799, 2020. http://dx.doi.org/10.1038/s41419-020-02995-9.
- Mustafa Hasöksüz, Selcuk Kilic, and Fahriye Saraç. Coronaviruses and sars-cov-2. Turkish journal of medical sciences, 50(9):549–556, 2020. http://dx.doi.org/10.3906/sag-2004-127.
- World Health Organization et al. Coronavirus disease 2019 (covid-19): situation report, 116. 2020. http://dx.doi.org/10.2139/ssrn.3566298.
- Marco Ciotti, Massimo Ciccozzi, Alessandro Terrinoni, Wen-Can Jiang, Cheng-Bin Wang, and Sergio Bernardini. The covid-19 pandemic. Critical reviews in clinical laboratory sciences, 57(6):365–388, 2020. http://dx.doi.org/10.1080/10408363.2020.1783198.
- Rakesh Padhan and KP Prabheesh. The economics of covid-19 pandemic: A survey. Economic analysis and policy, 70:220–237, 2021. http://dx.doi.org/10.1016/j.eap.2021.02.012.
- Walter Cullen, Gautam Gulati, and Brendan D Kelly. Mental health in the covid-19 pandemic. QJM: An International Journal of Medicine, 113(5):311–312, 2020. http://dx.doi.org/10.1093/qjmed/hcaa110.
- Hua Ye, Peiliang Wu, Tianru Zhu, Zhongxiang Xiao, Xie Zhang, Long Zheng, Rongwei Zheng, Yangjie Sun, Weilong Zhou, Qinlei Fu, et al. Diagnosing coronavirus disease 2019 (covid-19): Efficient harris hawks-inspired fuzzy k-nearest neighbor prediction methods. IEEE Access, 9:17787–17802, 2021. http://dx.doi.org/10.1109/access.2021.3052835.
- Ahmed Hamed, Ahmed Sobhy, and Hamed Nassar. Accurate classification of covid-19 based on incomplete heterogeneous data using a k nn variant algorithm. Arabian Journal for Science and Engineering, 46:8261–8272, 2021. http://dx.doi.org/10.1007/s13369-020-05212-z.
- Mehmet Tahir Huyut and Hilal Üstündağ. Prediction of diagnosis and prognosis of covid-19 disease by blood gas parameters using decision trees machine learning model: a retrospective observational study. Medical gas research, 12(2):60–66, 2022. http://dx.doi.org/10.4103/2045-9912.326002.
- Vijander Singh, Ramesh Chandra Poonia, Sandeep Kumar, Pranav Dass, Pankaj Agarwal, Vaibhav Bhatnagar, and Linesh Raja. Prediction of covid-19 corona virus pandemic based on time series data using support vector machine. Journal of Discrete Mathematical Sciences and Cryptography, 23(8):1583–1597, 2020. http://dx.doi.org/10.1080/09720529.2020.1784535.
- Y Lebrini, A Boudhar, R Hadria, H Lionboui, L Elmansouri, R Arrach, P Ceccato, and T Benabdelouahab. Identifying agricultural systems using svm classification approach based on phenological metrics in a semi-arid region of morocco. Earth Systems and Environment, 3(2):277–288, 2019. http://dx.doi.org/10.1007/s41748-019-00106-z.
- Sajja Tulasi Krishna and Hemantha Kumar Kalluri. Lung image classification to identify abnormal cells using radial basis kernel function of svm. In Smart Technologies in Data Science and Communication: Proceedings of SMART-DSC 2019, pages 279–285. Springer, 2020. http://dx.doi.org/10.1007/978-981-15-2407-333.
- Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone. Cart. Classification and regression trees, 1984. http://dx.doi.org/10.1201/9781315139470-8.
- Richard K Zimmerman, Mary Patricia Nowalk, Todd Bear, Rachel Taber, Karen S Clarke, Theresa M Sax, Heather Eng, Lloyd G Clarke, and GK Balasubramani. Proposed clinical indicators for efficient screening and testing for covid-19 infection using classification and regression trees (cart) analysis. Human Vaccines & Immunotherapeutics, 17(4):1109–1112, 2021. http://dx.doi.org/10.1080/21645515.2020.1822135.
- Sayato Fukui, Akihiro Inui, Takayuki Komatsu, Kanako Ogura, Yutaka Ozaki, Manabu Sugita, Mizue Saita, Daiki Kobayashi, and Toshio Naito. A predictive rule for covid-19 pneumonia among covid-19 patients: A classification and regression tree (cart) analysis model. Cureus, 15(9), 2023. http://dx.doi.org/10.7759/cureus.45199.
- Tianqi Chen, Tong He, Michael Benesty, Vadim Khotilovich, Yuan Tang, Hyunsu Cho, Kailong Chen, Rory Mitchell, Ignacio Cano, Tianyi Zhou, et al. Xgboost: extreme gradient boosting. R package version 0.4-2, 1(4):1–4, 2015. http://dx.doi.org/10.32614/cran.package.xgboost.
- Junling Luo, Zhongliang Zhang, Yao Fu, and Feng Rao. Time series prediction of covid-19 transmission in america using lstm and xgboost algorithms. Results in Physics, 27:104462, 2021. DOI: 10.1016/j.rinp.2021.104462.
- Edelson Damasceno Carvalho, Edson Damasceno Carvalho, Antonio Oseas de Carvalho Filho, Flávio Henrique Duarte de Araújo, and Ricardo de Andrade Lira Rabêlo. Diagnosis of covid-19 in ct image using cnn and xgboost. In 2020 IEEE Symposium on Computers and Communications (ISCC), pages 1–6. IEEE, 2020. http://dx.doi.org/10.1109/iscc50000.2020.9219726.
- Zheng-gang Fang, Shu-qin Yang, Cai-xia Lv, Shu-yi An, and Wei Wu. Application of a data-driven xgboost model for the prediction of covid-19 in the usa: a time-series study. BMJ open, 12(7):e056685, 2022. http://dx.doi.org/10.1136/bmjopen-2021-056685.
- Thomas G Dietterich et al. Ensemble learning. The handbook of brain theory and neural networks, 2(1):110–125, 2002. http://dx.doi.org/10.7551/mitpress/3413.001.0001.
- Leo Breiman. Bagging predictors. Machine learning, 24:123–140, 1996. http://dx.doi.org/10.1007/bf00058655.
- Robert E Schapire et al. A brief introduction to boosting. In Ijcai, volume 99, pages 1401–1406. Citeseer, 1999. http://dx.doi.org/10.1007/3-540-49097-31.
- Kai Ming Ting and Ian H Witten. Stacking bagged and dagged models. 1997. http://dx.doi.org/10.1109/icdm.2010.49.
- Covid-19 dataset. https://www.kaggle.com/datasets/selfishgene/covid19-worldometer-snapshots-since-april-18?resource=download, last accessed on 18 mar.
- Worldometer information about coronavirus. https://www.worldometers.info/coronavirus/, last accessed on 17 mar.
- Ronen Fluss, David Faraggi, and Benjamin Reiser. Estimation of the youden index and its associated cutoff point. Biometrical Journal: Journal of Mathematical Methods in Biosciences, 47(4):458–472, 2005. http://dx.doi.org/10.1002/bimj.200410135.