On the Applicability of the Pareto Principle to Source-Code Growth in Open Source Projects
Korneliusz Szymański, Mirosław Ochodek
DOI: http://dx.doi.org/10.15439/2023F5221
Citation: Proceedings of the 18th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 35, pages 781–789 (2023)
Abstract. Context: research on understanding the laws related to software-project evolution can indirectly impact the way we design software development processes, e.g., knowing the nature of the code-repository content growth could help us improve the ways we monitor the progress of OSS software development projects and predict their future development Goal: our aim is to empirically verify a hypothesis that the OSS code repositories grow in size according to the Pareto principle. Method: we collected and curated a sample of 31,343 OSS code repositories hosted on GitHub and analyzed their content growth over time to verify whether it follows the Pareto principle. Results: we observed that, on average, monotonically growing OSS repositories reach 75\% of their final content size within the first 25\% revisions. Conclusions: the content size of monotonically growing OSS repositories seems to grow in size according to the Pareto principle with the 75/25 ratio.
References
- K. Yamashita, S. McIntosh, Y. Kamei, A. E. Hassan, and N. Ubayashi, “Revisiting the applicability of the pareto principle to core development teams in open source software projects,” in Proceedings of the 14th international workshop on principles of software evolution, 2015, pp. 46–55.
- E. Shihab, N. Bettenburg, B. Adams, and A. E. Hassan, “On the central role of mailing lists in open source projects: An exploratory study,” in New Frontiers in Artificial Intelligence: JSAI-isAI 2009 Workshops, LENLS, JURISIN, KCSD, LLLL, Tokyo, Japan, November 19-20, 2009, Revised Selected Papers 1. Springer, 2009, pp. 91–103.
- M. Goeminne and T. Mens, “Evidence for the pareto principle in open source software activity,” in First International Workshop on Model-Driven Software Migration (MDSM 2011), 2011, p. 74.
- A. Murgia, G. Concas, S. Pinna, R. Tonelli, I. Turnu et al., “Empirical study of software quality evolution in open source projects using agile practices,” in Proc. of the 1st International Symposium on Emerging Trends in Software Metrics, vol. 11, 2009.
- C.-Y. Huang, C.-S. Kuo, and S.-P. Luan, “Evaluation and application of bounded generalized pareto analysis to fault distributions in open source software,” IEEE Transactions on Reliability, vol. 63, no. 1, pp. 309–319, 2013.
- J. M. Juran, “Pareto, Lorenz, Cournot, Bernoulli, Juran and others,” in Critical evaluations in business management, J. C. Wood and W. M. C., Eds. Routledge, 2004, ch. 1, pp. 47–49.
- A.-M. Chaniotaki and T. Sharma, “Architecture smells and pareto principle: A preliminary empirical exploration,” in 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 2021, pp. 190–194.
- A.-J. Molnar and S. Motogna, “Long-term evaluation of technical debt in open-source software,” in Proceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2020, pp. 1–9.
- I. Herraiz, J. M. Gonzalez-Barahona, and G. Robles, “Towards a theoretical model for software growth,” in Fourth International Workshop on Mining Software Repositories (MSR’07: ICSE Workshops 2007). IEEE, 2007, pp. 21–21.
- R. Van Solingen, V. Basili, G. Caldiera, and H. D. Rombach, “Goal question metric (gqm) approach,” Encyclopedia of software engineering, 2002.
- D. M. Paul Rubin, “word cound,” https://linux.die.net/man/1/wc, lines word count tool, GNU General Public License.
- N. Munaiah, S. Kroh, C. Cabrey, and M. Nagappan, “Reporeapers,” https://reporeapers.github.io/results/1.html, 2017, [Online; accessed 12-June-2022]".
- P. Pickerill, H. J. Jungen, M. Ochodek, M. Maćkowiak, and M. Staron, “Phantom: Curating github for engineered software projects using time-series clustering,” Empirical Software Engineering, vol. 25, no. 4, pp. 2897–2929, 2020.
- S. Byeon, “Gitstats, https://pypi.org/project/gitstats/.
- R. Dunford, Q. Su, and E. Tamang, “The pareto principle,” The Plymouth Student Scientist, vol. 7, no. 1, pp. 140–148, 2014.
- A. Boboia and C. Polinicencu, “Application of the pareto analysis regarding the research on the value of preparations in community pharmacies from cluj-napoca, romania,” Farmacia, vol. 60, no. 4, pp. 578–585, 2012.