QEDrants—Data Quality Quadrants for Business Users and Decision-Makers
Alina Powała, Dominik Ślęzak
DOI: http://dx.doi.org/10.15439/2024F0005
Citation: Position Papers of the 19th Conference on Computer Science and Intelligence Systems, M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 40, pages 79–88 (2024)
Abstract. The adoption of artificial intelligence (AI) in business is often hindered by the complexity of data quality assessment. This paper introduces the quadrant-based data quality representation framework, which evaluates data assets based on two complementary dimensions: Data Integrity (accuracy and reliability, akin to Gartner's ``Ability to Execute'') and Data Coverage (breadth and comprehensiveness, similar to ``Completeness of Vision''). The framework categorizes data into four groups: \emph{Pure Gold} (AI-ready), \emph{Sleeping Giants} (high integrity, low coverage), \emph{Unpolished Diamonds} (high coverage, low integrity), and \emph{Hitchhikers} (low integrity, low coverage). Each such quadrant provides actionable insights for business users, helping them prioritize data assets for AI readiness, identify data cleaning tasks, balancing costs and value realization by focusing on the right data. Given the roots of this idea in QED Software's technology experiences, we call the proposed quadrants as \emph{QEDrants}.
References
- U. Jagare, Operating AI: Bridging the Gap Between Technology and Business, Wiley, 2022.
- M. Świechowski, The History of Artificial Intelligence: From Leonardo da Vinci to Chat-GPT, Amazon KDP, 2024.
- G.L. Geerts and D.E. O’Leary, “V-Matrix: A Wave Theory of Value Creation for Big Data,” International Journal of Accounting Information Systems, vol. 47, pp. 100575, 2022.
- R.Y. Wang and D.M. Strong, “Beyond Accuracy: What Data Quality Means to Data Consumers,” Journal of Management Information Systems, vol. 12, no. 4, pp. 5–34, 1996.
- C. Batini, C. Cappiello, C. Francalanci, and A. Maurino, “Methodologies for Data Quality Assessment and Improvement,” ACM Computing Surveys, vol. 41, no. 3, pp. 1–52, 2009.
- S. Sadiq and M. Indulska, “Open Data: Quality Over Quantity,” International Journal of Information Management, vol. 37, no. 3, pp. 150–154, 2017.
- Y. Gil and B. Selman, “A 20-year Community Roadmap for Artificial Intelligence Research in the US,” AI Magazine, vol. 40, no. 1, pp. 8–24, 2019.
- R.Y. Wang and S.E. Madnick, “A Polygen Model for Heterogeneous Database Systems: The Source Tagging Perspective,” in Proceedings of VLDB 1990, 1990, pp. 519–538.
- L. Pipino, Y.W. Lee, and R.Y. Wang, “Data Quality Assessment,” Communications of the ACM, vol. 45, pp. 211–218, 2002.
- M.G. Kahn, T.J. Callahan, J. Barnard, A.E. Bauck, J. Brown, B.N. Davidson, H. Estiri, C. Goerg, E. Holve, S.G. Johnson, S.T. Liaw, M. Hamilton-Lopez, D. Meeker, T.C. Ong, P. Ryan, N. Shang, N.G. Weiskopf, C. Weng, M.N. Zozus, and L. Schilling, “A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data,” Journal of Electronic Health Data and Methods, vol. 4, no. 1, pp. 18, 2016.
- Y. Lee, D. Strong, B. Kahn, and R. Wang, “AIMQ: A Methodology for Information Quality Assessment,” Information & Management, vol. 40, pp. 133–146, 12 2002.
- S. Bresciani and M.J. Eppler, “Case Nr.2, 2008 – Updated in 2010 Gartner’s Magic Quadrant and Hype Cycle,” 2010.
- M. Kowalski, D. Ślęzak, and P. Synak, “Approximate Assistance for Correlated Subqueries,” in Proceedings of FedCSIS 2013, 2013, pp. 1455–1462.
- M. Eppler and M. Helfert, “A Classification and Analysis of Data Quality Costs,” in Proceedings of ICIQ 2004, 2004, pp. 311–325.
- A. Janusz, A. Zalewska, Ł. Wawrowski, P. Biczyk, J. Ludziejewski, M. Sikora, and D. Śl ̨ezak, “BrightBox – A Rough Set Based Technology for Diagnosing Mistakes of Machine Learning Models,” Applied Soft Computing, vol. 141, pp. 110285, 2023.
- D. Kałuża, A. Janusz, and D. Śl ̨ezak, “Robust Assignment of Labels for Active Learning with Sparse and Noisy Annotations,” in Proceedings of ECAI 2023. 2023, vol. 372 of Frontiers in Artificial Intelligence and Applications, pp. 1207–1214, IOS Press.
- G.J. Miller, “Artificial Intelligence Project Success Factors – Beyond the Ethical Principles,” in Post-Proceedings of FedCSIS-AIST 2021. 2021, vol. 442 of Lecture Notes in Business Information Processing, pp. 65–96, Springer.
- M.S. Szczuka, A. Janusz, B. Cyganek, J. Grabek, Ł. Przebinda, A. Zalewska, A. Bukała, and D. Śl ̨ezak, “IEEE BigData Cup 2022 Report Privacy-preserving Matching of Encrypted Images,” in Proceedings of IEEE BigData 2022. 2022, pp. 6471–6480, IEEE.
- A. Janusz, G. Hao, D. Kałuża, T. Li, R. Wojciechowski, and D. Śl ̨ezak, “Predicting Escalations in Customer Support: Analysis of Data Mining Challenge Results,” in Proceedings of IEEE BigData 2020. 2020, pp. 5519–5526, IEEE.
- T. Mroczek, D. Gil, and B. P ̨ekała, “Fuzzy and Rough Approach to the Problem of Missing Data in Fall Detection System,” Fuzzy Sets and Systems, vol. 480, pp. 108868, 2024.
- A. Powała and D. Ślęzak, “Hierarchical Approach to Data Quality Understanding in QEDrant Framework,” in Proceedings of IEEE BigData 2024. 2024, IEEE.
- M. Bartoszuk, J. Litwin, M. Wnuk, and D. Ślęzak, “Tensor-based Approach to Big Data Processing and Machine Learning,” in Proceedings of IEEE BigData 2022. 2022, pp. 6188–6194, IEEE.
- J. Bicevskis, Z. Bicevska, A. Nikiforova, and I. Oditis, “Towards Data Quality Runtime Verification,” in Proceedings of FedCSIS 2019, 2019, vol. 18 of Annals of Computer Science and Information Systems, pp. 639–643.