Discovering relationships between data in enterprise system using log analysis

Łukasz Korzeniowski; Krzysztof Goczyła

Discovering relationships between data in enterprise system using log analysis

Łukasz Korzeniowski, Krzysztof Goczyła

DOI: http://dx.doi.org/10.15439/2023F4617

Citation: Proceedings of the 18th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 35, pages 141–150 (2023)

Full text

Abstract. Enterprise systems are inherently complex and maintaining their full, up-to-date overview poses a serious challenge to the enterprise architects' teams. This problem encourages the search for automated means of discovering knowledge about such systems. An important aspect of this knowledge is understanding the data that are processed by applications and their relationships. In our previous work, we used application logs of an enterprise system to derive knowledge about the interactions taking place between applications. In this paper, we further explore logs to discover correspondence between data processed by different applications. Our contribution is the following: we propose a method for discovering relationships between data using log analysis, we validate our method against a benchmark system AcmeAir and we validate our method against a real-life system running at Nordea Bank.

References

L. Korzeniowski and K. Goczyla, “Discovering interactions between applications with log analysis,” 2022. http://dx.doi.org/10.15439/2022F172 p. 861 – 869.
L. Korzeniowski and K. Goczyla, “Landscape of automated log analysis: A systematic literature review and mapping study,” IEEE Access, vol. 10, pp. 21 892–21 913, 2022. http://dx.doi.org/10.1109/ACCESS.2022.3152549
D. Barua, N. T. Rumpa, S. Hossen, and M. M. Ali, “Ontology based log analysis of web servers using process mining techniques,” 2019, Conference paper. http://dx.doi.org/10.1109/ICECE.2018.8636791 p. 341 – 344.
S.-L. Chuang and L.-F. Chien, “Enriching web taxonomies through subject categorization of query terms from search engine logs,” Decision Support Systems, vol. 35, no. 1, pp. 113–127, 2003. doi: https://doi.org/10.1016/S0167-9236(02)00099-4 Web Retrieval and Mining.
S. Khan and S. Parkinson, “Eliciting and utilising knowledge for security event log analysis: An association rule mining and automated planning approach,” Expert Systems with Applications, vol. 113, p. 116 – 127, 2018. http://dx.doi.org/10.1016/j.eswa.2018.07.006
Acmeair, “A java implementation of the acme air sample application.” last accessed: 2023-07-24. [Online]. Available: https://github.com/ acmeair/acmeair
C. D. Manning, H. Schütze, and G. Weikurn, “Foundations of statistical natural language processing,” SIGMOD Record, vol. 31, no. 3, p. 37 – 38, 2002. http://dx.doi.org/10.1145/601858.601867
C. M. Aderaldo, N. C. Mendonça, C. Pahl, and P. Jamshidi, “Benchmark requirements for microservices architecture research,” 2017. http://dx.doi.org/10.1109/ECASE.2017.4 p. 8 – 13.
Acmeair, “A nodejs implementation of the acme air sample application with extended logging.” last accessed: 2023-07-24, commitId: 59e8545c1e5264107e60706a360e0c8133aa8f9e. [Online]. Available: https://github.com/lkorzeni11/acmeair-nodejs
Q. Fu, J. Zhu, W. Hu, J.-G. Lou, R. Ding, Q. Lin, D. Zhang, and T. Xie, “Where do developers log? an empirical study on logging practices in industry,” 2014, Conference paper. http://dx.doi.org/10.1145/2591062.2591175 p. 24 – 33.
P. Shvaiko and J. Euzenat, “A survey of schema-based matching approaches,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3730 LNCS, p. 146 – 171, 2005. http://dx.doi.org/10.1007/11603412_5
E. Rahm and E. Peukert, “Large-scale schema matching,” in Encyclopedia of Big Data Technologies, 1st ed., S. Sakr and A. Zomaya, Eds. Springer Publishing Company, Incorporated, 2019. ISBN 331977526X
E. Rahm and E. Peukert, “Holistic schema matching,” in Encyclopedia of Big Data Technologies, 1st ed., S. Sakr and A. Zomaya, Eds. Springer Publishing Company, Incorporated, 2019. ISBN 331977526X
A. Bilke and F. Naumann, “Schema matching using duplicates,” 2005. http://dx.doi.org/10.1109/ICDE.2005.126 p. 69 – 80.
X. Xue and H. Zhu, “Matching knowledge graphs with compact niching evolutionary algorithm,” Expert Systems with Applications, vol. 203, 2022. http://dx.doi.org/10.1016/j.eswa.2022.117371
M. Hulsebos, A. Satyanarayan, K. Hu, T. Kraska, M. Bakker, Demiralp, E. Zgraggen, and C. Hidalgo, “Sherlock: A deep learning approach to semantic data type detection,” 2019, Conference paper. http://dx.doi.org/10.1145/3292500.3330993 p. 1500 – 1508.
D. Zhang, Y. Suhara, J. Li, M. Hulsebos, Demiralp, and W. C. Tan, “Sato: Contextual semantic type detection in tables,” Proceedings of the VLDB Endowment, vol. 13, no. 11, p. 1835 – 1848, 2020. http://dx.doi.org/10.14778/3407790.3407793
F. Piai, P. Atzeni, P. Merialdo, and D. Srivastava, “Fine-grained semantic type discovery for heterogeneous sources using clustering,” VLDB Journal, vol. 32, no. 2, p. 305 – 324, 2023. http://dx.doi.org/10.1007/s00778-022-00743-3
D. Barua, N. T. Rumpa, S. Hossen, and M. M. Ali, “Ontology based log analysis of web servers using process mining techniques,” 2019, Conference paper. http://dx.doi.org/10.1109/ICECE.2018.8636791. ISBN 978-153867482-6 p. 341 – 344.
R. Vaarandi and M. Pihelgas, “Logcluster - a data clustering and pattern mining algorithm for event logs,” 2015, Conference paper. http://dx.doi.org/10.1109/CNSM.2015.7367331 p. 1 – 7.
P. He, J. Zhu, Z. Zheng, and M. R. Lyu, “Drain: An online log parsing approach with fixed depth tree,” 2017, Conference paper. http://dx.doi.org/10.1109/ICWS.2017.13 p. 33 – 40.
T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 2013, Conference paper.
J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” 2014, Conference paper. http://dx.doi.org/10.3115/v1/d14-1162 p. 1532 – 1543.