Imputing Missing Values for Improved Statistical Inference Applied to Intrauterine Growth Restriction Problem

Agnieszka Wosiak; Kinga Glinka; Agata Zamecznik; Katarzyna Niewiadomska-Jarosik

Imputing Missing Values for Improved Statistical Inference Applied to Intrauterine Growth Restriction Problem

Agnieszka Wosiak, Kinga Glinka, Agata Zamecznik, Katarzyna Niewiadomska-Jarosik

DOI: http://dx.doi.org/10.15439/2018F196

Citation: Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 15, pages 129–135 (2018)

Full text

Abstract. The paper describes the study on the problem of missing values in medical data collected to discover new dependencies between parameters of children born with intrauterine growth restriction disorder. The aim of the research is to propose a procedure that may be taken to improve the medical inference in the presence of missing data. The approach with use of unconditional mean and k-nearest neighbor imputation has been applied. The experiments proved that application of missing data imputation in original dataset yields more valuable dependencies when compared to original data, maintaining the confidence interval for goodness of fit with the original distribution above 90\%. The discovered dependencies in data may establish the basis for new treatment procedures of children with intrauterine growth restriction disorder.

References

Armijo-Olivo S., Warren S., Magee D. (2009). Intention to treat analysis, compliance, drop-outs and how to deal with missing data in clinical research: a review. Physical Therapy Reviews, Vol. 14(1), pp. 36-49, http://dx.doi.org/10.1179/174328809X405928.
Higgins J. P., Green S. (Eds.). (2011). Cochrane handbook for systematic reviews of interventions. John Wiley & Sons.
Lachin J. M. (2000). Statistical considerations in the intent-to-treat principle. Contemporary Clinical Trials, Vol. 21(3), pp. 167-189.
Janssen K. J., Donders A. R. T., Harrell F. E., Vergouwe Y., Chen Q., Grobbee D. E., Moons K. G. (2010). Missing covariate data in medical research: to impute is better than to ignore. Journal of Clinical Epidemiology, Vol. 63(7), pp. 721-727, http://dx.doi.org/10.1016/j.jclinepi.2009.12.008.
Farhangfar A., Kurgan L. A., Pedrycz W. (2007). A novel framework for imputation of missing values in databases. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, Vol. 37(5), pp. 692-709, http://dx.doi.org/10.1109/TSMCA.2007.902631.
Cios K. J., Moore G. W. (2002). Uniqueness of medical data mining. Artificial Intelligence in Medicine, Vol. 26(1-2), pp. 1-24.
Kurgan L. A., Cios K. J., Sontag, M., Accurso F. J. (2005). Mining the cystic fibrosis data. In: Next generation of data-mining applications, IEEE Press, pp. 415-444.
Klebanoff M. A., Cole S. R. (2008). Use of multiple imputation in the epidemiologic literature. American Journal of Epidemiology, Vol. 168(4), pp. 355-357, http://dx.doi.org/10.1093/aje/kwn071.
Donders A. R. T., Van Der Heijden G. J., Stijnen T., Moons K. G. (2006). A gentle introduction to imputation of missing values. Journal of Clinical Epidemiology, Vol. 59(10), pp. 1087-1091, http://dx.doi.org/10.1016/j.jclinepi.2006.01.014.
Aydilek I. B., Arslan, A. (2013). A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Information Sciences, Vol. 233, pp. 25-35, http://dx.doi.org/10.1016/j.ins.2013.01.021.
Farhangfar A., Kurgan L., Dy, J. (2008). Impact of imputation of missing values on classification error for discrete data. Pattern Recognition, Vol. 41(12), pp. 3692-3705, http://dx.doi.org/10.1016/j.patcog.2008.05.019.
Moons K. G., Donders R. A., Stijnen T., Harrell F. E. (2006). Using the outcome for imputation of missing predictor values was preferred. Journal of Clinical Epidemiology, Vol. 59(10), pp. 1092-1101, http://dx.doi.org/10.1016/j.jclinepi.2006.01.009.
Li T., Hutfless S., Scharfstein D. O., Daniels M. J., Hogan J. W., Little R. J., Royh J. A., Law A.H., Dickersin K. (2014). Standards should be applied in the prevention and handling of missing data for patient-centered outcomes research: a systematic review and expert consensus. Journal of Clinical Epidemiology, Vol. 67(1), pp. 15-32, http://dx.doi.org/10.1016/j.jclinepi.2013.08.013.
Andridge R. R., Little, R. J. (2010). A review of hot deck imputation for survey non-response. International Statistical Review, Vol. 78(1), pp. 40-64, http://dx.doi.org/10.1111/j.1751-5823.2010.00103.x.
Myers T. A. (2011). Goodbye, listwise deletion: Presenting hot deck imputation as an easy and effective tool for handling missing data. Communication Methods and Measures, Vol. 5(4), pp. 297-310, http://dx.doi.org/10.1080/19312458.2011.624490.
Joenssen D. W., Bankhofer U. (2012). Hot deck methods for imputing missing data. In International Workshop on Machine Learning and Data Mining in Pattern Recognition, pp. 63-75, http://dx.doi.org/10.1007/978-3-642-31537-4_6.
Zhang S. (2011). Shell-neighbor method and its application in missing data imputation. Applied Intelligence, Vol. 35(1), pp. 123-133, http://dx.doi.org/10.1007/s10489-009-0207-6.
Yu Q., Miche Y., Eirola E., Van Heeswijk M., SeVerin E., Lendasse A. (2013). Regularized extreme learning machine for regression with missing data. Neurocomputing, Vol. 102, pp. 45-51, http://dx.doi.org/10.1016/j.neucom.2012.02.040.
Van Buuren S., Oudshoorn K. (1999). Flexible multivariate imputation by MICE. Leiden, The Netherlands: TNO Prevention Center.
Horton N. J., Lipsitz S. R. (2001). Multiple imputation in practice: comparison of software packages for regression models with missing variables. The American Statistician, Vol. 55(3), pp. 244-254.
Zhang P. (2003). Multiple imputation: theory and method. International Statistical Review, vol. 71(3), pp. 581-592, http://dx.doi.org/10.1111/j.1751-5823.2003.tb00213.x
Fichman M., Cummings J. N. (2003). Multiple imputation for missing data: Making the most of what you know. Organizational Research Methods, vol. 6(3), pp. 282-308.
Zhong M., Sharma S., Lingras P. (2004). Genetically designed models for accurate imputation of missing traffic counts. Transportation Research Record: Journal of the Transportation Research Board, vol. 1879, pp. 71-79, http://dx.doi.org/10.3141/1879-09.
Sterne J. A., White I. R., Carlin J. B., Spratt M., Royston P., Kenward M. G., Carpenter J. R. (2009). Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ, vol. 338(b2393), http://dx.doi.org/10.1136/bmj.b2393.
Gadbury G. L., Coffey C. S., Allison D. B. (2003). Modern statistical methods for handling missing repeated measurements in obesity trial data: beyond LOCF. Obesity Reviews, Vol. 4(3), pp. 175-184, http://dx.doi.org/10.1046/j.1467-789X.2003.00109.x.
Mahajan, S.D. and Aalinkeel, R. and Singh, S. and Shah, P. and Gupta, N. and Kochupillai, N.: "Endocrine regulation in asymmetric intrauterine fetal growth retardation", Journal of Maternal-Fetal and Neonatal Medicine, 2006, vol. 19(10), pp. 615–623, DOI: 10.1080/14767050600799901
Black, R.E. and Victora, C.G. and Walker, S.P. and Bhutta, Z.A. and Christian, P. and de Onis, M. and et al.: "Maternal and child undernutrition and overweight in low-income and middle-income countries", Lancet, 2013, vol. 382, pp. 427–451, http://dx.doi.org/10.1016/S0140- 6736(13)60937-X
Gürgen, F. and Zeynep, Z. and Füsun, V.: "Intrauterine growth restriction (IUGR) risk decision based on support vector machines", Expert Systems with Applications, 2012, vol.39(3), pp. 2872–2876, http://dx.doi.org/10.1016/j.eswa.2011.08.147
Bagi, K.S. and Shreedhara, K.S.: "Biometric measurement and classification of IUGR using neural networks", Proceedings of the International Conference on Contemporary Computing and Informatics (IC3I 2014), 2014, pp. 157–161, http://dx.doi.org/10.1109/IC3I.2014.7019613
Dessi A., Atzori L., Noto A., Visser A. G. H., Gazzolo D., Zanardo V., Magistris A. D. (2011). Metabolomics in newborns with intrauterine growth retardation (IUGR): urine reveals markers of metabolic syndrome. The Journal of Maternal-Fetal & Neonatal Medicine, Vol. 24(sup2), pp. 35-39 http://dx.doi.org/10.3109/14767058.2011.605868.
Neitzke U. T. A., Harder T., Plagemann A. (2011). Intrauterine growth restriction and developmental programming of the metabolic syndrome: a critical appraisal. Microcirculation, Vol. 18(4), pp. 304-311, http://dx.doi.org/10.1111/j.1549-8719.2011.00089.x .
Zamecznik, A. and Niewiadomska-Jarosik, K. and Wosiak, A. and Zamojska, J. and Moll, J. and Stańczyk, J.: Intra-uterine growth restriction as a risk factor for hypertension in children six to 10 years old, Cardiovascular Journal of Africa, 2014, pp.73–77, http://dx.doi.org/10.5830/CVJA-2014-009
Niewiadomska-Jarosik K., Zamojska J., Zamecznik A., Stańczyk J., Wosiak A., Jarosik P. (2017). Myocardial dysfunction in children with intrauterine growth restriction: an echocardiographic study. Cardiovascular Journal of Africa, Vol. 28(1), pp. 36-39, http://dx.doi.org/10.5830/CVJA-2016-053.
Zamecznik A., Stańczyk J., Wosiak A., Niewiadomska-Jarosik K. (2017). Time domain parameters of heart rate variability in children born as small-for-gestational age. Cardiology in the Young, Vol. 27(4), pp. 663-670, http://dx.doi.org/10.1017/S1047951116001001.
Malinowski, A. and Chlebna-Sokół, D.: "Dziecko łódzkie-metody badań i normy rozwoju biologicznego", Ankal, 1998, (In Polish)
Baneshi M. R., Talei A. R. (2010). Impact of imputation of missing data on estimation of survival rates: an example in breast cancer. Iranian Journal of Cancer Prevention, Vol 3(3), pp. 127-131.
Luengo J., Garcia S., Herrera F. (2012). On the choice of the best imputation methods for missing values considering three groups of classification methods. Knowledge and information systems, Vol. 32(1), pp. 77-108, http://dx.doi.org/10.1007/s10115-011-0424-2.
Tran C. T., Andreae P., Zhang M. (2015). Impact of imputation of missing values on genetic programming based multiple feature construction for classification. In Evolutionary Computation (CEC), 2015 IEEE Congress on, pp. 2398-2405, http://dx.doi.org/10.1109/CEC.2015.7257182.
Ridgway G. R., Lehmann M., Barnes J., Rohrer J. D., Warren J. D., Crutch S. J., Fox N. C. (2012). Early-onset Alzheimer disease clinical variants multivariate analyses of cortical thickness. Neurology, vol. 79(1), pp. 80-84, http://dx.doi.org/10.1212/WNL.0b013e31825dce28.
Pawlak R., Korzeniewska E., Koneczny C., Halgas, B. (2017). Properties Of Thin Metal Layers Deposited On Textile Composites By Using The Pvd Method For Textronic Applications. Autex Research Journal. Vol. 17(3). pp. 229-237 http://dx.doi.org/10.1515/aut-2017-0015. Korzeniewska E., Walczak M., Rymaszewski J. (2017). Elements of elastic electronics created on textile substrate. Proceedings of The 24th International Conference Mixed Design of Integrated Circuits and Systems - MIXDES 2017. pp. 447-450.
Tkaczyk M., Maternik M., Krakowska A., Wosiak A., Miklaszewska M., Zachwieja K., Runowski D., Jander A., Ratajczak D., Korzeniecka-Kozyrska A., Mader-Wolynska I., Kilis-Pstrusinska K. (2017). Evaluation of the effect of 3-month bladder basic advice in children with monosymptomatic nocturnal enuresis. Journal of Pediatric Urology. Vol. 13. pp. 615.e1-e615.e6. http://dx.doi.org/10.1016/j.jpurol.2017.03.039.
Grzymala-Busse J. W., Clark P. G., Kuehnhausen M. (2014). Generalized probabilistic approximations of incomplete data. International Journal of Approximate Reasoning. Vol. 55(1). pp. 180-196. http://dx.doi.org/10.1016/j.ijar.2013.04.007.
Clark P. G., Grzymala-Busse J. W., Rzasa W. (2014). Mining incomplete data with singleton, subset and concept probabilistic approximations. Information Sciences. Vol. 280. pp. 368-384. http://dx.doi.org/10.1016/j.ins.2014.05.007.
Komenda M., Karolyi M., Vyskovsky R., Jezova K., Scavnicky J.(2017). Towards a Keyword Extraction in Medical and Healthcare Education. Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 11, pp. 173-176, http://dx.doi.org/10.15439/2017F351.
Bhaskar J., Sruthi K., Nedungadi P. (2015). Hybrid approach for emotion classification of audio conversation based on text and speech mining. Procedia Computer Science, vol. 46, pp. 635-643, http://dx.doi.org/10.1016/j.procs.2015.02.112.
Wojciechowski A., Staniucha R. (2016). Mouth features extraction for emotion classification. Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 8, pp. 1685-1692, http://dx.doi.org/10.15439/2016F390.
Tomczyk, A. (2014). Detection of line segments. Journal of Applied Computer Science. Vol. 22 No. 2 (2014), pp. 81-90, URL: http://it.p.lodz.pl/file.php/12/2014-2/jacs-2014-2-Tomczyk.pdf
Zaitseva E., Levashenko V., Kvassay M., Deserno T.M. (2016). Reliability Estimation of Healthcare Systems using Fuzzy Decision Trees. Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 8, pp. 331-340, http://dx.doi.org/10.15439/2016F150.
Paja W. (2015). Medical diagnosis support and accuracy improvement by application of total scoring from feature selection approach. Proceedings of the 2015 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 5, pp. 281-286, http://dx.doi.org/10.15439/2015F361.
Paja W, Pancerz K. (2017). Feature Selection Methods Applied to Severe Brain Damages Data. Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 11, pp. 199-202, http://dx.doi.org/10.15439/2017F382.
Duraj A., Niewiadomski A., Szczepaniak P. S. (2018) Outlier detection using linguistically quantified statements. International Journal of Intelligent Systems. http://dx.doi.org/10.1002/int.21924