About Classifiers Quality Assessment: Balanced Accuracy Curve (BAC) as an alternative for ROC and PR Curve

Aleksandra Weiss; Marcin Młyński; Piotr Artiemjew

About Classifiers Quality Assessment: Balanced Accuracy Curve (BAC) as an alternative for ROC and PR Curve

Aleksandra Weiss, Marcin Młyński, Piotr Artiemjew

DOI: http://dx.doi.org/10.15439/2022F262

Citation: Proceedings of the 17th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 30, pages 149–156 (2022)

Full text

Abstract. In this work, we propose a new parameter to study the effectiveness of classifiers - the AUC (area under curve) of the balanced accuracy curve (BAC) on data with different balance degrees - we compare its effectiveness with the popular AUC parameters for the ROC and PR curve. We use a global kNN classifier with typical metrics to verify the utility of the new parameter. BAC, ROC and PR curves generate similar results, the advantage of BAC is its simplicity of implementation and ease of interpretation of results.

References

Woodward, P. M. (1953). Probability and information theory with applications to radar. London: Pergamon Press.
Peterson, W., Birdsall, T., Fox, W. (1954). The theory of signal detectability, Transactions of the IRE Professional Group on Information Theory, 4, 4, pp. 171 - 212.
Manning, C., Schutze, H. (1999). Foundations of statistical natural language processing. MIT Press
Raghavan, V., Bollmann, P., Jung, G. S. (1989). A critical investigation of recall and precision as measures of retrieval system performance. ACM Trans. Inf. Syst., 7, 205–229.
Davis, J., Goadrich, M.: 2006. The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd international conference on Machine learning (ICML ’06). Association for Computing Machinery, New York, NY, USA, 233–240. https://doi.org/10.1145/1143844.1143874
Saito T., and Rehmsmeier M. 2015. "The Precision-Recall Plot Is More Informative Than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets." PLoS ONE. 10(3): e0118432
Williams, C.K.I. 2021. "The Effect of Class Imbalance on Precision-Recall Curves." Neural Computation 33(4): 853–857.
Morzy, Tadeusz. Eksploracja danych. Red. . Warszawa: Wydawnictwo Naukowe PWN, 2013, 566 s. ISBN 978-83-01-17175-9
Hastie T., Friedman J., Tibshirani R. (2001) The Elements of Statistical Learning. Springer Series in Statistics. Springer, New York, NY.
Qimin Cao, Lei La, Hongxia Liu, and Si Han. Mixed Weighted KNN for Imbalanced Datasets [J]. Int J Performability Eng, 2018, 14(7): 1391-1400.
L., Polkowski, P., Artiemjew, “Granular Computing in Decision Approximation - An Application of Rough Mereology,” in: Intelligent Systems Reference Library 77, Springer, ISBN 978-3-319-12879-5, 2015, pp. 1-422.
Japkowicz, N., Shah, M. (2011). Evaluating Learning Algorithms: A Classification Perspective. Cambridge: Cambridge University Press. http://dx.doi.org/10.1017/CBO9780511921803
Metrics definition: manhattan, euclidean, canberra, cosine https://www.itl.nist.gov/div898/software/dataplot/homepage.htm
epsilonHamming Metric definition: In: Polkowski, L., Artiemjew, P.: Granular Computing in Decision Approximation - An Application of Rough Mereology, In: Intelligent Systems Reference Library 77, Springer, ISBN 978-3-319-12879-5, pp. 1–422 (2015).
UCI Machine Learning Repository, https://archive.ics.uci.edu/ml/index.php. Last accessed 12 Apr 2022