An Empirical Study on Application of Word Embedding Techniques for Prediction of Software Defect Severity Level

Lov Kumar; Mukesh Kumar; Lalita Bhanu Murthy; Sanjay Misra; Vipul Kocher; Srinivas Padmanabhuni

An Empirical Study on Application of Word Embedding Techniques for Prediction of Software Defect Severity Level

Lov Kumar, Mukesh Kumar, Lalita Bhanu Murthy, Sanjay Misra, Vipul Kocher, Srinivas Padmanabhuni

DOI: http://dx.doi.org/10.15439/2021F100

Citation: Proceedings of the 16th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 25, pages 477–484 (2021)

Full text

Abstract. This work aims to develop defect severity level prediction models that have the ability to assign severity level of defects based on bugs report. In this work, seven different word embedding techniques are applied to defect description to represent the word, not just as a number but as a vector in n-dimensional space. Further, three feature selection techniques have been applied to find the right set of relevant vectors. The effectiveness of these word embedding techniques and different sets of vectors are evaluated using different classification techniques with SMOTE to overcome the class imbalance problem.

References

R. Malhotra and A. Jain, “Fault prediction using statistical and machine learning methods for improving software quality,” Journal of Information Processing Systems.
L. Kumar, S. Misra, and S. K. Rath, “An empirical analysis of the effectiveness of software metrics and fault prediction model for identifying faulty classes,” Computer Standards & Interfaces, vol. 53, pp. 1–32, 2017.
R. Malhotra, N. Kapoor, R. Jain, and S. Biyani, “Severity assessment of software defect reports using text classification,” International Journal of Computer Applications, vol. 83, no. 11, 2013.
G. Abaei, A. Selamat, and H. Fujita, “An empirical study based on semisupervised hybrid self-organizing map for software fault prediction,” Knowledge-Based Systems.
S. Kim and E. J. Whitehead Jr, “How long did it take to fix bugs?” in Proceedings of the 2006 international workshop on Mining software repositories, 2006, pp. 173–174.
P. Bhattacharya and I. Neamtiu, “Bug-fix time prediction models: can we do better?” in Proceedings of the 8th Working Conference on Mining Software Repositories, 2011, pp. 207–210.
A. More and D. P. Rana, “Review of random forest classification techniques to resolve data imbalance,” in 2017 1st International Conference on Intelligent Systems and Information Management (ICISIM). IEEE, 2017, pp. 72–78.
N. Junsomboon and T. Phienthrakul, “Combining over-sampling and under-sampling techniques for imbalance dataset,” in Proceedings of the 9th International Conference on Machine Learning and Computing, 2017, pp. 243–247.
T. Menzies and A. Marcus, “Automated severity assessment of software defect reports,” in 2008 IEEE International Conference on Software Maintenance. IEEE, 2008, pp. 346–355.
R. Jindal, R. Malhotra, and A. Jain, “Software defect prediction using neural networks,” in Proceedings of 3rd International Conference on Reliability, Infocom Technologies and Optimization. IEEE, 2014, pp. 1–6.
S. Ghaluh Indah Permata, “An attribute selection for severity level determination according to the support vector machine classification result,” in proceedings intl conf information system business competitiveness, 2012.
A. Lamkanfi, S. Demeyer, E. Giger, and B. Goethals, “Predicting the severity of a reported bug,” in 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). IEEE, 2010, pp. 1–10.
A. Lamkanfi, S. Demeyer, Q. D. Soetens, and T. Verdonck, “Comparing mining algorithms for predicting the severity of a reported bug,” in 2011 15th European Conference on Software Maintenance and Reengineering. IEEE, 2011, pp. 249–258.
Y. Tian, D. Lo, and C. Sun, “Information retrieval based nearest neighbor classification for fine-grained bug severity prediction,” in 2012 19th Working Conference on Reverse Engineering. IEEE, 2012, pp. 215–224.
M. Sharma, P. Bedi, K. Chaturvedi, and V. Singh, “Predicting the priority of a reported bug using machine learning techniques and cross project validation,” in 2012 12th International Conference on Intelligent Systems Design and Applications (ISDA). IEEE, 2012, pp. 539–545.
M. Gayathri and A. Sudha, “Software defect prediction system using multilayer perceptron neural network with data mining,” International Journal of Recent Technology and Engineering, vol. 3, no. 2, pp. 54–59, 2014.
D. L. Gupta and K. Saxena, “Software bug prediction using object-oriented metrics,” Sādhanā, vol. 42, no. 5, pp. 655–669, 2017.
H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions on knowledge and data engineering, vol. 21, no. 9, pp. 1263–1284, 2009.
N. V. Chawla, “Data mining for imbalanced datasets: An overview,” in Data mining and knowledge discovery handbook. Springer, 2005, pp. 853–867.
T. R. Hoens and N. V. Chawla, “Imbalanced datasets: from sampling to classifiers,” Imbalanced Learning: Foundations, Algorithms, and Applications, pp. 43–59, 2013.
S. S. Rathore and S. Kumar, “An empirical study of some software fault prediction techniques for the number of faults prediction,” Soft Computing, vol. 21, no. 24, pp. 7417–7434, 2017.
R. Malhotra, Empirical research in software engineering: concepts, analysis, and applications. CRC Press, 2016.