A case study on machine learning model for code review expert system in software engineering

Michał Madera; Rafał Tomoń

A case study on machine learning model for code review expert system in software engineering

Michał Madera, Rafał Tomoń

DOI: http://dx.doi.org/10.15439/2017F536

Citation: Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 11, pages 1357–1363 (2017)

Full text

Abstract. Code review is a key tool for quality assurance in software development. It is intended to find coding mistakes overlooked during development phase and lower risk of bugs in final product. In large and complex projects accurate code review is a challenging task. As code review depends on individual reviewer predisposition there is certain margin of source code changes that is not checked as it should. In this paper we propose machine learning approach for pointing project artifacts that are significantly at risk of failure. Planning and adjusting quality assurance (QA) activities could strongly benefit from accurate estimation of software areas endangered by defects. Extended code review could be directed there. The proposed approach has been evaluated for feasibility on large medical software project. Significant work was done to extract features from heterogeneous production data, leading to good predictive model. Our preliminary research results were considered worthy of implementation in the company where the research has been conducted, thus opening the opportunities for the continuation of the studies.

References

The Economic Impact of Inadequate Infrastructure for Software Testing. National Institute Of Standards & Technology, 2002.
L. A. Curhan, “Software defect tracking during new product development of a computer system,”
D. Huizinga and A. Kolawa, Automated Defect Prevention: Best Practices in Software Management. .
S. McIntosh, Y. Kamei, B. Adams, and A. E. Hassan, “The Impact of Code Review Coverage and Code Review Participation on Software Quality: A Case Study of the Qt, VTK, and ITK Projects,” in Proceedings of the 11th Working Conference on Mining Software Repositories, New York, NY, USA, 2014, pp. 192–201 http://dx.doi.org/10.1145/2597073.2597076.
“ISO 13485 Medical devices.” [Online]. Available: https://www.iso.org/iso-13485-medical-devices.html.
E. Alpaydin, Introduction to Machine Learning. The MIT Press, 2014.
I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining, Fourth Edition: Practical Machine Learning Tools and Techniques, 4 edition. Amsterdam: Morgan Kaufmann, 2016.
H. Arndt, “The Java Data Mining Package - A Data Processing Library for Java,” in 2009 33rd Annual IEEE International Computer Software and Applications Conference, 2009, vol. 1, pp. 620–621 http://dx.doi.org/10.1109/COMPSAC.2009.88.
“Python Data Analysis Library — pandas: Python Data Analysis Library.” [Online]. Available: http://pandas.pydata.org/. [Accessed: 30-May-2017].
“SourceMonitor V3.5.” [Online]. Available: http://www.campwoodsw.com/sourcemonitor.html. [Accessed: 29-May-2017].
X. Yang, R. G. Kula, N. Yoshida, and H. Iida, “Mining the Modern Code Review Repositories: A Dataset of People, Process and Product,” in 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR), 2016, pp. 460–463 http://dx.doi.org/10.1109/MSR.2016.054.
A. E. Hassan, “Predicting faults using the complexity of code changes,” in 2009 IEEE 31st International Conference on Software Engineering, 2009, pp. 78–88 http://dx.doi.org/10.1109/ICSE.2009.5070510.
“CKJM extended - An extended version of Tool for Calculating Chidamber and Kemerer Java Metrics (and many other metrics).” [Online]. Available: http://gromit.iiar.pwr.wroc.pl/p_inf/ckjm/. [Accessed: 29-May-2017].
M. D’Ambros, M. Lanza, and R. Robbes, “Evaluating defect prediction approaches: a benchmark and an extensive comparison,” Empir. Softw. Eng., vol. 17, no. 4–5, pp. 531–577, Aug. 2012 http://dx.doi.org/10.1007/s10664-011-9173-9.
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA Data Mining Software: An Update,” SIGKDD Explor Newsl, vol. 11, no. 1, pp. 10–18, Nov. 2009 http://dx.doi.org/10.1145/1656274.1656278.
J. I. Khan, A. U. Gias, M. S. Siddik, M. H. Rahman, S. M. Khaled, and M. Shoyaib, “An attribute selection process for software defect prediction,” in 2014 International Conference on Informatics, Electronics Vision (ICIEV), 2014, pp. 1–4 http://dx.doi.org/10.1109/ICIEV.2014.6850791.
B. Mishra and K. K. Shukla, “Impact of attribute selection on defect proneness prediction in OO software,” in 2011 2nd International Conference on Computer and Communication Technology (ICCCT-2011), 2011, pp. 367–372 http://dx.doi.org/10.1109/ICCCT.2011.6075151.
T. M. Khoshgoftaar, K. Gao, and N. Seliya, “Attribute Selection and Imbalanced Data: Problems in Software Defect Prediction,” in 2010 22nd IEEE International Conference on Tools with Artificial Intelligence, 2010, vol. 1, pp. 137–144 http://dx.doi.org/10.1109/ICTAI.2010.27.