Verifying cuts as a tool for improving a classifier based on a decision tree

Łukasz Dydo, Jan Bazan, Sylwia Buregwa-Czuma, Wojciech Rząsa, Andrzej Skowron

DOI: http://dx.doi.org/10.15439/2016F266

Citation: Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 8, pages 17–20 (2016)

Full text

Abstract. This article is a continuation of previous work, in which a new method of decision tree construction was presented. That method is based on the use of so-called verifying cuts, which can provide knowledge obtained from the attributes frequently eliminated when greedy methods of the choice of singleton best cuts are applied. Till now only one strategy of choosing verifying cuts was examined. It exploits a measure based on a number of pairs of objects discerned by a chosen cut. In this paper, we examine two additional measures used for determining the best verifying cuts. They are based on Gini's Index and Entropy. The paper includes the results of experiments that have been performed on data obtained from biomedical database and machine learning repositories.

References

Bazan, J., G., Bazan-Socha, S., Buregwa-Czuma, Dydo, L., Rzasa, W., Skowron, A.: A classifier based on a decision tree with verifying cuts. Fundamenta Informaticae, vol. 143, no. 1-2, pp. 1-18, 2016
Bazan, J. G., Bazan-Socha, S., Buregwa-Czuma, S., Pardel, P. W., Sokolowska, B.: Predicting the presence of serious coronary artery disease based on 24 hour Holter ECG monitoring. In: M. Ganzha, L. Maciaszek, M. Paprzycki (eds.), Proceedings of the Federated Conference on Computer Science and Information Systems, 2012, pp. 279-286, IEEE Xplore - digital library.
Bazan, J. G., Szczuka, M.: The Rough Set Exploration System. Transactions on Rough Sets, III, LNCS 3400, 2005, pp. 37--56.
Bazan, J. G., Nguyen, H. S., Nguyen, S. H., Synak, P., Wróblewski, J.: Rough set algorithms in classification problems. In: L. Polkowski, T. Y. Lin, S. Tsumoto (eds.), “Rough Set Methods and Applications: New Developments in Knowledge Discovery in Information Systems,” Studies in Fuzziness and Soft Computing, Springer-Verlag/PhysicaVerlag, vol. 56, 2000, pp. 49–88.
Breiman, L. et. al., Classification and Regression Trees. Wadsworth, Belmont, 1984.
The Elements of Statistical Learning repository, http://statweb.stanford.edu/~tibs/ElemStatLearn/datasets/
Kent Ridge Biomedical Dataset repository, http://datam.i2r.a-star.edu.sg/datasets/krbd/
Nguyen, H. S.: Approximate Boolean Reasoning: Foundations and Applications in Data Mining, Transactions on Rough Sets, V, LNCS 4100, 2006, pp. 334–506.
Quinlan, J. R.: C4.5: Programs for machine learning, Morgan Kaufmann, San Mateo, California (1993)
Shannon, C. E.: A mathematical theory of communication, Bell System Technical Journal, 27 (1948), pp. 379-423.
UC Irvine Machine Learning Repository, http://archive.ics.uci.edu/ml/