Data Compression Measures for Meta-Learning Systems

Marcin Blachnik; Mirosław Kordos; Sławomir Golak

Data Compression Measures for Meta-Learning Systems

Marcin Blachnik, Mirosław Kordos, Sławomir Golak

DOI: http://dx.doi.org/10.15439/2018F87

Citation: Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 15, pages 25–28 (2018)

Full text

Abstract. An important issue in building predictive models is the ability to quickly assess various aspects of the achievable performance of the model to know what outcome we can expect and how to optimally build the model. As instance selection is one of the preprocessing steps that has to be performed anyway, we can use it to obtain the meta-data descriptors for meta-learning systems. When we only need to estimate the classification accuracy of the model, the compression obtained from instance selection is a good approximator. However, when we need to estimate other performance measures such as the precision and sensitivity then the quality of the estimated performance drops. To overcome this issue we propose a new type of compression measure: the balanced compression which shows high correlation with precision and sensitivity of the final classifiers. We also show that the application of the balanced compression as a meta-learning descriptor allows for precise assessment of the model performance, as proved by the presented experimental evaluation.

References

N. Jankowski, W. Duch, K. Grąbczewski, Meta-learning in computational intelligence. Springer Science & Business Media, vol. 358, 2011.
M. Blachnik, “On the relation between knn accuracy and dataset compression level,” LNAI, vol. 9692, pp. 541–551, 2016.
M. Blachnik, “Instance selection for classifier performance estimation in meta learning,” Entropy, vol. 19, no. 11, p. 583, 2017.
C. Burges, “A tutorial on support vector machines for pattern recognition,” Data mining and knowledge discovery, vol. 2, tt. 121–167, 1998.
M. Kordos, M. Blachnik, J. Kozłowski, M. Perzyk, O. Bystrzycki, M. Gródek, A. Byrdziak, Z. Motyka, “A Hybrid System with Regression Trees in Steelmaking Process,” LNAI, vol. 6678, pp. 222-229, June 2011.
M. Kordos, “Optimization of Evolutionary Instance Selection,” LNAI, vol. 10245, pp. 359-369, ICAISC, June 2017
S. García, J. Derrac, J. R. Cano, and F. Herrera, “Prototype selection for nearest neighbor classification: Taxonomy and empirical study,” IEEE Trans Pattern Anal and Mach Intell, vol. 34, no. 3, pp. 417–435, 2012.
P. Hart, “The condensed nearest neighbor rule.” IEEE Trans. on Information Theory, vol. 16, pp. 515–516, 1968.
D. Wilson, “Assymptotic properties of nearest neighbour rules using edited data.” IEEE Trans. on Systems, Man, and Cybernetics, vol. SMC-2, pp. 408–421, 1972.
M. Kordos, M. Blachnik, and S. Białka, “Instance selection in logical rule extraction for regression problems,” LNAI, vol. 7895, pp. 167–175, 2013.
F. Herrera, “Keel, knowledge extraction based on evolutionary learning,” http://www.keel.es, 2005, spanish National Projects TIC2002-04036-C05, TIN2005-08386-C05 and TIN2008-06681-C06. [Online]. Available: http://www.keel.es
M. Blachnik and M. Kordos, “Information selection and data compression rapidminer library,” in Machine Intelligence and Big Data in Industry. Springer, 2016, pp. 135–145.
M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter, “Efficient and robust automated machine learning,” in Advances in Neural Information Processing Systems 28, Curran Associates, Inc., 2015, pp. 2962–2970.
M. Kozielski, “A meta-learning approach to methane concentration value prediction,” in Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. Springer, 2015, pp. 716–726.
M. Reif, F. Shafait, and A. Dengel, “Meta-learning for evolutionary parameter optimization of classifiers,” Machine Learning, vol. 87, no. 3, pp. 357–380, 2012.
F. Pinto, C. Soares, and J. Mendes-Moreira, “Towards automatic generation of metafeatures,” in Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 2016, pp. 215–226.
Q. Sun and B. Pfahringer, “Pairwise meta-rules for better meta-learning-based algorithm ranking,” Machine learning, vol. 93, no. 1, pp. 141–161, 2013.
B. L. Welch, “The generalization ofstudent’s’ problem when several different population variances are involved,” Biometrika, vol. 34, no. 1/2, pp. 28–35, 1947.