Random Forest Feature Selection for Data Coming from Evaluation Sheets of Subjects with ASDs

Krzysztof Pancerz, Wiesław Paja, Jerzy Gomuła

DOI: http://dx.doi.org/10.15439/2016F274

Citation: Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 8, pages 299–302 (2016)

Full text

Abstract. We deal with the problem of initial analysis of data coming from evaluation sheets of subjects with Autism Spectrum Disorders (ASDs). In our research, we use an original evaluation sheet including questions about competencies grouped into 17 spheres. In the paper, we are focused on a feature selection problem. The main goal is to use appropriate data to build simpler and more accurate classifiers. The feature selection method based on random forest is used.

References

K. Cios, W. Pedrycz, R. Swiniarski, and L. Kurgan, Data mining. A knowledge discovery approach. New York: Springer, 2007.
S. García, J. Luengo, and F. Herrera, Data Preprocessing in Data Mining, ser. Intelligent Systems Reference Library. Switzerland: Springer International Publishing, 2015, vol. 72.
N. Jankowski and M. Grochowski, “Comparison of instances selection algorithms I. Algorithms survey,” in Artificial Intelligence and Soft Computing - ICAISC 2004, ser. Lecture Notes in Computer Science, L. Rutkowski, J. H. Siekmann, R. Tadeusiewicz, and L. A. Zadeh, Eds. Berlin, Heidelberg: Springer-Verlag, 2004, vol. 3070, pp. 598–603.
K. Pancerz, A. Derkacz, and J. Gomuła, “Consistency-based preprocessing for classification of data coming from evaluation sheets of subjects with ASDs,” in Position Papers of the 2015 Federated Conference on Computer Science and Information Systems (FedCSIS’2015), ser. Annals of Computer Science and Information Systems, M. Ganzha, L. Maciaszek, and M. Paprzycki, Eds., vol. 6, Lodz, Poland, 2015. http://dx.doi.org/10.15439/2015F393 pp. 63–67.
E. Tuv, A. Borisov, G. Runger, and K. Torkkola, “Feature selection with ensembles, artificial variables, and redundancy elimination,” Journal of Machine Learning Research, vol. 10, pp. 1341–1366, 2009.
W. R. Rudnicki, M. Wrzesień, and W. Paja, “All relevant feature selection methods and applications,” in Feature Selection for Data and Pattern Recognition, ser. Studies in Computational Intelligence, U. Stańczyk and C. L. Jain, Eds. Berlin, Heidelberg: Springer-Verlag, 2015, vol. 584, pp. 11–28.
M. Kursa and W. Rudnicki, “Feature selection with the Boruta package,” Journal of Statistical Software, vol. 36, no. 1, 2010. http://dx.doi.org/10.18637/jss.v036.i11
J. G. Bazan and M. S. Szczuka, “The Rough Set Exploration System,” in Transactions on Rough Sets III, ser. Lecture Notes in Artificial Intelligence, J. Peters and A. Skowron, Eds. Berlin Heidelberg: Springer-Verlag, 2005, vol. 3400, pp. 37–56.
J. Demšar, T. Curk, A. Erjavec, Črt Gorup, T. Hočevar, M. Milutinovič, M. Možina, M. Polajnar, M. Toplak, A. Starič, M. Štajdohar, L. Umek, L. Žagar, J. Žbontar, M. Žitnik, and B. Zupan, “Orange: Data mining toolbox in Python,” Journal of Machine Learning Research, vol. 14, pp. 2349–2353, 2013.
J. Grzymala-Busse, “A new version of the rule induction system LERS,” Fundamenta Informaticae, vol. 31, pp. 27–39, 1997.
Z. Pawlak and A. Skowron, “Rudiments of rough sets,” Information Sciences, vol. 177, pp. 3–27, 2007. http://dx.doi.org/10.1016/j.ins.2006.06.003