Comparative Analysis of AI and Traditional Software GitHub Repositories Using Process Mining
Oguzhan Tasci, Tugba Gurgen Erdogan
DOI: http://dx.doi.org/10.15439/2025F3421
Citation: Proceedings of the 20th Conference on Computer Science and Intelligence Systems (FedCSIS), M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 43, pages 387–392 (2025)
Abstract. This study conducts a comparative analysis of Artificial Intelligence (AI) software and traditional GitHub repositories, focusing on workflow efficiency and sentiment dynamics. Using process mining and sentiment analysis techniques, we examine repositories from eight prominent projects, encompassing diverse datasets of issues and pull requests filtered for relevance and consistency. Our findings reveal that AI software repositories exhibit different workflow patterns and sentiment dynamics compared to traditional repositories. Sentiment analysis uncovers that contributors to AI software repositories experience more positive sentiment dynamics, likely reflecting structured workflows and collaborative tools. Conversely, traditional repositories exhibit longer resolution times and more fluctuating sentiment patterns, which may indicate higher complexity or less automation. These insights provide valuable recommendations for optimizing repository management, fostering contributor satisfaction, and improving collaborative software development environments.
References
- João Caldeira, Fernando Brito e Abreu, Jorge Cardoso, Rachel Simões, Toacy Oliveira, and José Pereira dos Reis. Software development analytics in practice: A systematic literature review. Archives of Computational Methods in Engineering, 30(3):2041–2080, 2023.
- Tugba Gurgen Erdogan, Haluk Altunel, and Ayça Kolukısa Tarhan. A process model for ai-enabled software development: A synthesis from validation studies in white literature. Journal of Software: Evolution and Process, page e2743, 2024.
- Wil Van Der Aalst and Wil van der Aalst. Data science in action. Springer, 2016.
- Davide Spadini, Maurício Aniche, and Alberto Bacchelli. Pydriller: Python framework for mining software repositories. In Proceedings of the 2018 26th ACM Joint meeting on european software engineering conference and symposium on the foundations of software engineering, pages 908–911, 2018.
- Bohan Liu, He Zhang, Weigang Ma, Hongyu Kuang, Yi Yang, Jinwei Xu, Shan Gao, and Jian Gao. Mining pull requests to detect process anomalies in open source software development. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, pages 1–13, 2024.
- Bjoern M Eskofier. Exploration of process mining opportunities in educational software engineering-the gitlab analyser. In Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020), pages 601–604, 2020.
- Martin Macak, Daniela Kruzelova, Stanislav Chren, and Barbora Buhnova. Using process mining for git log analysis of projects in a software development course. Education and information technologies, 26(5):5939–5969, 2021.
- Saimir Bala Thanh Nguyen and Jan Mendling. Multi-dimensional process analysis of software development projects. pages 179 – 186, 2024.
- Zhou Yang, Chenyu Wang, Jieke Shi, Thong Hoang, Pavneet Kochhar, Qinghua Lu, Zhenchang Xing, and David Lo. What do users ask in open-source ai repositories? an empirical study of github issues. In 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR), pages 79–91. IEEE, 2023.
- Jeffrey Fairbanks, Akshharaa Tharigonda, and Nasir U Eisty. Analyzing the effects of ci/cd on open source repositories in github and gitlab. In 2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA), pages 176–181. IEEE, 2023.
- Rifat Ara Proma and Paul Rosen. Visual analysis of github issues to gain insights. arXiv preprint https://arxiv.org/abs/2407.20900, 2024.
- K Højelse, T Kilbak, J Røssum, E Jäpelt, Leonel Merino, and M Lungu. Git-truck: Hierarchy-oriented visualization of git repository evolution. In 2022 Working Conference on Software Visualization (VISSOFT), pages 131–140. IEEE, 2022.
- Wouter Poncin, Alexander Serebrenik, and Mark van den Brand. Process mining software repositories. In 2011 15th European Conference on Software Maintenance and Reengineering, pages 5–14. IEEE, 2011.
- Francisco Jurado and Pablo Rodriguez. Sentiment analysis in monitoring software development processes: An exploratory case study on github’s project issues. Journal of Systems and Software, 104:82–89, Jun. 2015.
- Bo Yang, Xinjie Wei, and Chao Liu. Sentiments analysis in github repositories: An empirical study. In 2016 Asia-Pacific Software Engineering Conference Workshops (APSECW), pages 67–74. IEEE, 2016.
- Eleni Guzman, David Azócar, and Yijun Li. Sentiment analysis of commit comments in github: An empirical study. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR), pages 352–355, 2014.
- William N. Robinson, Tianjie Deng, and Zirun Qi. Developer behavior and sentiment from data mining open source repositories. In 2016 49th Hawaii International Conference on System Sciences (HICSS), pages 5386–5395. IEEE, 2016.
- Vinayak Sinha, Alina Lazar, and Bonita Sharif. Analyzing developer sentiment in commit logs. In 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR), pages 520–523. IEEE, 2016.
- TensorFlow Repository. An open source machine learning framework for everyone, 2025. Last accessed 1 January 2025.
- PyTorch. Tensors and dynamic neural networks in python with strong gpu acceleration, 2025. Last accessed 1 January 2025.
- Scikit-Learn. scikit-learn: machine learning in python, 2025. Last accessed 1 January 2025.
- Keras Team. Deep learning for humans, 2025. Last accessed 1 January 2025.
- Node.js. Node.js is an open-source, cross-platform javascript runtime environment, 2025. Last accessed 1 January 2025.
- Bootstrap. The most popular html, css, and javascript framework for developing responsive, mobile first projects on the web, 2025. Last accessed 1 January 2025.
- Facebook React. The library for web and native user interfaces, 2025. Last accessed 1 January 2025.
- Angular. Deliver web apps with confidence, 2025. Last accessed 1 January 2025.
- Christian W Günther and Anne Rozinat. Disco: Discover your processes. In Demonstration Track of the 10th International Conference on Business Process Management, BPM Demos 2012, pages 40–44. CEUR-WS. org, 2012.
- Alessandro Berti, Sebastiaan van Zelst, and Daniel Schuster. Pm4py: A process mining library for python. Software Impacts, 17:100556, 2023.
- Thomas Mueller. H2 Database Engine, 2023. Version 2.2.222, last accessed 27 May 2025.
- C.J. Hutto and Eric Gilbert. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the 8th International Conference on Weblogs and Social Media (ICWSM-14), pages 216–225. AAAI, 2014.
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint https://arxiv.org/abs/1810.04805, 2019. Fine-tuning methodology described within the original BERT paper.