Logo PTI Logo FedCSIS

Proceedings of the 17th Conference on Computer Science and Intelligence Systems

Annals of Computer Science and Information Systems, Volume 30

About Classifiers Quality Assessment: Balanced Accuracy Curve (BAC) as an alternative for ROC and PR Curve

, ,

DOI: http://dx.doi.org/10.15439/2022F262

Citation: Proceedings of the 17th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 30, pages 149156 ()

Full text

Abstract. In this work, we propose a new parameter to study the effectiveness of classifiers - the AUC (area under curve) of the balanced accuracy curve (BAC) on data with different balance degrees - we compare its effectiveness with the popular AUC parameters for the ROC and PR curve. We use a global kNN classifier with typical metrics to verify the utility of the new parameter. BAC, ROC and PR curves generate similar results, the advantage of BAC is its simplicity of implementation and ease of interpretation of results.

References

  1. Woodward, P. M. (1953). Probability and information theory with applications to radar. London: Pergamon Press.
  2. Peterson, W., Birdsall, T., Fox, W. (1954). The theory of signal detectability, Transactions of the IRE Professional Group on Information Theory, 4, 4, pp. 171 - 212.
  3. Manning, C., Schutze, H. (1999). Foundations of statistical natural language processing. MIT Press
  4. Raghavan, V., Bollmann, P., Jung, G. S. (1989). A critical investigation of recall and precision as measures of retrieval system performance. ACM Trans. Inf. Syst., 7, 205–229.
  5. Davis, J., Goadrich, M.: 2006. The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd international conference on Machine learning (ICML ’06). Association for Computing Machinery, New York, NY, USA, 233–240. https://doi.org/10.1145/1143844.1143874
  6. Saito T., and Rehmsmeier M. 2015. "The Precision-Recall Plot Is More Informative Than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets." PLoS ONE. 10(3): e0118432
  7. Williams, C.K.I. 2021. "The Effect of Class Imbalance on Precision-Recall Curves." Neural Computation 33(4): 853–857.
  8. Morzy, Tadeusz. Eksploracja danych. Red. . Warszawa: Wydawnictwo Naukowe PWN, 2013, 566 s. ISBN 978-83-01-17175-9
  9. Hastie T., Friedman J., Tibshirani R. (2001) The Elements of Statistical Learning. Springer Series in Statistics. Springer, New York, NY.
  10. Qimin Cao, Lei La, Hongxia Liu, and Si Han. Mixed Weighted KNN for Imbalanced Datasets [J]. Int J Performability Eng, 2018, 14(7): 1391-1400.
  11. L., Polkowski, P., Artiemjew, “Granular Computing in Decision Approximation - An Application of Rough Mereology,” in: Intelligent Systems Reference Library 77, Springer, ISBN 978-3-319-12879-5, 2015, pp. 1-422.
  12. Japkowicz, N., Shah, M. (2011). Evaluating Learning Algorithms: A Classification Perspective. Cambridge: Cambridge University Press. http://dx.doi.org/10.1017/CBO9780511921803
  13. Metrics definition: manhattan, euclidean, canberra, cosine https://www.itl.nist.gov/div898/software/dataplot/homepage.htm
  14. epsilonHamming Metric definition: In: Polkowski, L., Artiemjew, P.: Granular Computing in Decision Approximation - An Application of Rough Mereology, In: Intelligent Systems Reference Library 77, Springer, ISBN 978-3-319-12879-5, pp. 1–422 (2015).
  15. UCI Machine Learning Repository, https://archive.ics.uci.edu/ml/index.php. Last accessed 12 Apr 2022