Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 15

Proceedings of the 2018 Federated Conference on Computer Science and Information Systems

Mining e-mail message sequences from log data

DOI: http://dx.doi.org/10.15439/2018F325

Citation: Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 15, pages 845848 ()

Full text

Abstract. Communication by electronic mail (e-mail), once extravagant, is now the usual way to exchange data and information. Widely accepted by Internet users, business and governments, it is claimed to be the key part of the e-revolution. E-mail systems have been successfully implemented in almost all computer-aided domains of human interest, providing efficient, effective and permanent mechanisms of transmission. However, to date, the capability to exhibit an ordered list (sequence) of e-mail message senders and recipients, with the respective duration time between receiving and answering is still lacking. To fill this gap, in this paper we introduce the SOMF algorithm for mining such sequences from server log data. We specified a three-stage approach to comprehensively target the problem. The first stage concerns a data preparation task in order to assemble the input for the algorithm. The second, known as data mining, is the automatic analysis of data input performed in an unsupervised model by the SOMF algorithm. The third embraces output (knowledge) visualization, interpretation and evaluation. The given case study is based on the log data from an operational STMP server. By design, this simplified example brings about a better understanding of the solution, indicating one of its potential applications to identify and eliminate deadlocks in the realization of business processes. We also tested the efficiency of the implementation of the algorithm in five independent experiments on seven datasets, ranging in size. The results show that mining even 1 million rows is performed in approximately less than 6 minutes.


  1. L. Mancilla-Amaya, C. Sanin, C., and E. Szczerbicki, "Using Human Behavior to Develop Knowledge-Based Virtual Organizations". Cybernetics and Systems: An International Journal, 41(8), pp. 577–591, 2010.
  2. M. Owoc, and K. Marciniak, "Knowledge management as foundation of smart university". Federated Conference on Computer Science and Information Systems. IEEE, pp. 1267–1272, 2013.
  3. M. Hernes, "Knowledge Integration Method for Supply Chain Management Module in a Cognitive Integrated Management Information System". In: International Conference on Computational Collective Intelligence, pp. 81–89. Springer, 2016.
  4. M. Pondel, and J. Korczak, "A view on the methodology of analysis and exploration of marketing data", 2017 Federated Conference on Computer Science and Information Systems. IEEE, pp. 1135–1143, 2017.
  5. M. Chui et al., "The social economy: Unlocking value and productivity through social technologies". McKinsey Global Institute, pp. 46, 2012.
  6. The Radicati Group. A Technology Market Research Firm. "Email Statistics Report 2015-2019", p.3. London (UK) 2015.
  7. K. Reinke, and T. Chamorro-Premuzic, "When email use gets out of control: Understanding the relationship between personality and email overload and their impact on burnout and work engagement". Computers in Human Behavior, 36, pp. 502–509, 2014.
  8. L. A. Dabbish, and R. E. Kraut, "Email overload at work: An analysis of factors associated with email strain". In Proceedings of the ACM conference on computer supported cooperative work (CSCW), pp. 431–440. New York, ACM Press 2006.
  9. P. Wang, C. Sanin, and E. Szczerbicki, "Prediction based on integration of decisional DNA and a feature selection algorithm RELIEF-F". Cybernetics and Systems, 44(2–3), pp. 173–183, 2013.
  10. A. Przybyłek, "The Integration of Functional Decomposition with UML Notation in Business Process Modelling". In: Advances in Information Systems Development, pp. 85–99. Springer 2007.
  11. B. Marcinkowski, and M. Kuciapski, "A business process modeling notation extension for risk handling". In: A. Cortesi, N. Chaki, K. Saeed, and S. Wierzchoń (Eds): Computer Information Systems and Industrial Management, pp. 374–381, Springer 2012.
  12. A. Przybyłek, "A Business-Oriented Approach to Requirements Elicitation". In: 9th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE’14), Lisbon, Portugal, 2014.
  13. R. Pellissier, and T. E. Nenzhelele, "Towards a universal competitive intelligence process model". South African Journal of Information Management, 15(2), 1–7, 2013.
  14. D. Heppes, and A. Du Toit, "Level of maturity of the competitive intelligence function: Case study of a retail bank in South Africa". Aslib Proceedings: New Information Perspectives 61(1), 48–66, 2009.
  15. B. Huijbrechts, M. Velikova, S. Michels, and R. Scheepens, "Metis1: An integrated reference architecture for addressing uncertainty in decision-support systems". Procedia Computer Science, 44, 476–485, 2015.
  16. R. Brody, "Issues in defining competitive intelligence: An exploration", Journal of Competitive Intelligence and Management 4(3), 3–16, 2008.
  17. J. Korczak, H. Dudycz, and M. Dyczkowski, "Design of financial knowledge in dashboard for SME managers". In: Computer Science and Information Systems, pp. 1123–1130, IEEE 2013.
  18. M. Owoc, P. Weichbroth, and K. Żuralski, "Towards better understanding of context-aware knowledge transformation". In: Computer Science and Information Systems, pp. 1123–1126, IEEE 2017.
  19. M. L. Owoc, "Wartościowanie wiedzy w inteligentnych systemach wspomagających zarządzanie". Prace Naukowe Akademii Ekonomicznej we Wrocławiu. Seria: Monografie i Opracowania. Wrocław 2004.