Software Sentiment Analysis using Deep-learning Approach with Word-Embedding Techniques
Venkata Krishna Chandra Mula, Lov Kumar, Lalita Bhanu Murthy, Aneesh Krishna
Citation: Proceedings of the 17th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 30, pages 873–882 (2022)
Abstract. Sentiment analysis for the software engineering community helps to find important information for various tasks, including the suggestion to improve code quality, defect-related comments for source code, possibilities for improvement etc. The manual finding of sentiment-based comments may be an inaccurate prediction and a time-consuming process. The automation of the sentiment analysis process by leveraging Machine Learning models can benefit software professionals by giving them other developers insights and feelings about software products, libraries, development, and maintenance tasks at a glance. This study aims to develop software sentiment prediction models based on comments by (1) identifying the bestembedding techniques to represent the word of the comments, not just as a number but as a vector in n-dimensional space (2) finding the best sets of vectors using different features selection techniques (3) finding best methods to handle class imbalance nature of the data, and (4) finding best architecture of deep-learning for the training of models. The developed models are validated using 5-fold cross-validation with four different performance parameters: accuracy, AUC, recall, and precision on three different datasets. The experimental finding shows that the models developed using the word embedding with feature selection using Deep Learning classifiers on balanced data can significantly predict the underlying sentiments of textual comments.
- B. Lin, F. Zampetti, G. Bavota, M. Di Penta, M. Lanza, and R. Oliveto, “Sentiment analysis for software engineering: How far can we go?” in Proceedings of the 40th international conference on software engineering, 2018, pp. 94–104.
- E. Biswas, K. Vijay-Shanker, and L. Pollock, “Exploring word embedding techniques to improve sentiment analysis of software engineering texts,” in 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, 2019, pp. 68–78.
- M. R. Islam and M. F. Zibran, “Leveraging automated sentiment analysis in software engineering,” in 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 2017, pp. 203–214.
- L. Kumar, S. Misra, and S. K. Rath, “An empirical analysis of the effectiveness of software metrics and fault prediction model for identifying faulty classes,” Computer Standards & Interfaces, vol. 53, pp. 1–32, 2017.
- R. Jindal, R. Malhotra, and A. Jain, “Software defect prediction using neural networks,” in Proceedings of 3rd International Conference on Reliability, Infocom Technologies and Optimization. IEEE, 2014, pp. 1–6.
- G. I. P. Sari and D. O. Siahaan, “An attribute selection for severity level determination according to the support vector machine classification result,” in Proceedings of the 1st international conference on information systems for business competitiveness (ICISBC), 2011.
- R. Malhotra and M. Khanna, “A text mining framework for analyzing change impact and maintenance effort of software bug reports,” International Journal of Information Retrieval Research (IJIRR), vol. 12, no. 1, pp. 1–18, 2022.
- R. Malhotra and J. Jain, “Predicting defects in imbalanced data using resampling methods: an empirical investigation,” PeerJ Computer Science, vol. 8, p. e573, 2022.
- L. Kumar, M. Kumar, L. B. Murthy, S. Misra, V. Kocher, and S. Padmanabhuni, “An empirical study on application of word embedding techniques for prediction of software defect severity level,” in 2021 16th Conference on Computer Science and Intelligence Systems (FedCSIS). IEEE, 2021, pp. 477–484.
- R. Panigrahi, L. Kumar, and S. K. Kuanar, “An empirical study to investigate different smote data sampling techniques for improving software refactoring prediction,” in International Conference on Neural Information Processing. Springer, 2020, pp. 23–31.
- L. Kumar, S. K. Sripada, A. Sureka, and S. K. Rath, “Effective fault prediction model developed using least square support vector machine (lssvm),” Journal of Systems and Software, vol. 137, pp. 686–712, 2018.