Predictive and Descriptive Analysis for Heart Disease Diagnosis

František Babič; Jaroslav Olejár; Zuzana Vantová; Ján Paralič

Predictive and Descriptive Analysis for Heart Disease Diagnosis

František Babič, Jaroslav Olejár, Zuzana Vantová, Ján Paralič

DOI: http://dx.doi.org/10.15439/2017F219

Citation: Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 11, pages 155–163 (2017)

Full text

Abstract. The heart disease describes a range of conditions affecting our heart. It can include blood vessel diseases such as coronary artery disease, heart rhythm problems or and heart defects. In addition, this term is often used for cardiovascular disease, i.e. narrowed or blocked blood vessels leading to a heart attack, chest pain or stroke. In our work, we analyzed three available datasets focused on heart diseases: Heart Disease Database, South African Heart Disease, Z-Alizadeh Sani Dataset and, Cardiac Dataset. For this purpose, we focused on two directions: predictive analysis based on Decision trees, Na\"{\i}ve Bayes, Support Vector Machine and Neural networks; descriptive analysis based on association and decision rules. Our results are plausible, in some cases comparable or better as in other related works.

References

P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C. Shearer, and R. Wirth: “CRISP-DM 1.0 Step-by-Step Data Mining Guide”, 2000.
C. Shearer, “The CRISP-DM Model: The New Blueprint for Data Mining”, Journal of Data Ware-housing, vol. 5, no. 4, 2000, pp. 13–22.
K.S. Murthy, “Automatic construction of decision tress from data: A multidisciplinary survey”, Data Mining and Knowledge Discovery, 1997, pp. 345–389, http://dx.doi.org/10.1007/s10618-016-0460-3.
J. R. Quinlan, “C4.5: Programs for Machine Learning”, Morgan Kaufmann Publishers, 1993, http://dx.doi.org/10.1007/BF00993309.
N. Patil, R. Lathi, and V. Chitre, “Comparison of C5.0 & CART Classification algorithms using pruning technique”, International Journal of Engineering Research & Technology, vol. 1, no. 4, 2012, pp. 1–5.
T. Hothorn, K. Hornik, and A. Zeileis, “Unbiased recursive partitioning: A conditional inference framework”, Journal of Computational and Graphical Statistics, vol. 15, no. 3, 2006, pp. 651–674, http://dx.doi.org/10.1198/106186006X133933.
L. Breiman, J.H. Friedman, R.A. Olshen, Ch.J. Stone, “Classification and Regression Trees”, 1999, CRC Press, http://dx.doi.org/10.1002/cyto.990080516.
D. J Hand, K. Yu, “Idiot's Bayes-not so stupid after all?”, International Statistical Review, vol. 69, no. 3, 2001, pp. 385–399. http://dx.doi.org/10.2307/1403452
C. Cortes, V. Vapnik, "Support-vector networks", Machine Learning, vol. 20, no. 3, 1995, pp. 273–297, http://dx.doi.org/10.1007/BF00994018.
K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural Networks, vol. 4, 1991, pp. 251–257, http://dx.doi.org/10.1016/0893-6080(91)90009-T.
R. Agrawal, R. Srikant, “Fast Algorithms for Mining Association Rules in Large Data-bases”, Proceedings of the 20th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1994, pp 487-499.
J. Hipp, U. Güntzer, and G. Nakhaeizadeh, “Algorithms for Association Rule Mining &Mdash; a General Survey and Comparison”, SIGKDD Explor Newsl 2, 2000, pp. 58–64, http://dx.doi.org/10.1145/360402.360421.
R. Agrawal, T. Imieliński, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases”, Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, ACM, New York, NY, USA, 1993, pp. 207–216, http://dx.doi.org/10.1145/170035.170072.
B. Shahbaba, “Biostatistics with R: An Introduction to Statistics through Biological Data”, 2012, Springer, http://dx.doi.org/10.1007/978-1-4614-1302-8.
J. E. Rossouw, J. du Plessis, A. Benade, P. Jordaan, J. Kotze, and P. Jooste, “Coronary risk factor screening in three rural communities”, South African Medical Journal, vol. 64, 1983, pp. 430–436.
R. Kreuger, “ST Segment”, ECGpedia.
R. Alizadehsani, M. J. Hosseini, Z. A. Sani, A. Gandeharioun, and R. Boghrati, “Diagnosis of Coronary Artery Disease Using Cost-Sensitive Algorithms”, IEEE 12th International Conference on Data Mining Workshop, 2012, pp. 9–16, http://dx.doi.org/10.1109/ICDMW.2012.29.
R. El-Bialy, M. A. Salamay, O. H. Karam, and M. E. Khalifa, "Feature Analysis of Coronary Artery Heart Disease Data Sets", Procedia Computer Science, ICCMIT 2015, vol. 65, pp. 459–468, http://dx.doi.org/10.1016/j.procs.2015.09.132.
L. Verma, S. Srivastaa, and P.C. Negi, "A Hybrid Data Mining Model to Predict Coronary Artery Disease Cases Using Non-Invasive Clinical Data", Journal of Medical Systems, vol. 40, no. 178, 2016, http://dx.doi.org/10.1007/s10916-016-0536-z.
R. Alizadehsani, J. Habibi, M. J. Hosseini, H. Mashayekhi, R. Boghrati, A. Ghandeharioun, B. Bahadorian, and Z. A. Sani, "A data mining approach for diagnosis of coronary artery disease", Computer Methods and Programs in Biomedicine, vol. 111, no. 1, 2013, pp. 52-61, http://dx.doi.org/10.1016/j.cmpb.2013.03.004.
Ch. Yadav, S. Lade, and M. Suman, "Predictive Analysis for the Diagnosis of Coronary Artery Disease using Association Rule Mining", International Journal of Computer Applications, vol. 87, no. 4, 2014, pp. 9-13.
S. S. Shapiro, M. B. Wilk, "An analysis of variance test for normality (complete samples)", Biometrika, vol. 52, no. 3–4, 1965, pp. 591–611, http://dx.doi.org/10.1093/biomet/52.3-4.591.
B. L. Welch, "On the Comparison of Several Mean Values: An Alternative Approach", Biometrika, vol. 38, 1951, pp. 330–336, http://dx.doi.org/10.2307/2332579.
K. Pearson, Karl, "On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling", Philosophical Magazine Series 5, vol. 50, no. 302, 1900, pp. 157–175, http://dx.doi.org/10.1080/14786440009463897.
R. A. Fisher, "On the interpretation of χ2 from contingency tables, and the calculation of P", Journal of the Royal Statistical Society, vol. 85, no. 1,1922, pp. 87–94, http://dx.doi.org/10.2307/2340521.
G. E. Batista, M.C. Monard, "A Study of K-Nearest Neighbour as an Imputation Method", In Proceedings of Soft Computing Systems: Design, Management and Applications, IOS Press, 2002, pp. 251-260, doi=10.1.1.14.3558.
Y. Dong, Ch-Y. J. Peng, "Principled missing data methods for researchers", Springerplus, vol. 2, vol. 222, 2013, http://dx.doi.org/ 10.1186/2193-1801-2-222.
D. Freedman, "Statistical Models: Theory and Practice. Cambridge", New York: Cambridge University Press, 2009, http://dx.doi.org/10.1017/CBO9780511815867.
H. B. Mann, D. R. Whitney, "On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other", Annals of Mathematical Statistics, vol. 18, no. 1, 1947, pp. 50–60, http://dx.doi.org/10.1214/aoms/1177730491.
P. Drotár, Z. Smékal, “Comparative Study of Machine Learning Techniques for Supervised Classification of Biomedical Data”, Acta Electrotechnica et Informatica, vol. 14, no. 3, 2014, pp. 5-10, http://dx.doi.org/10.15546/aeei-2014-0021