Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 18

Proceedings of the 2019 Federated Conference on Computer Science and Information Systems

Handling of Categorical Data in Software Development Effort Estimation: A Systematic Mapping Study


DOI: http://dx.doi.org/10.15439/2019F222

Citation: Proceedings of the 2019 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 18, pages 763770 ()

Full text

Abstract. Producing reliable and accurate estimates of software effort remains a difficult task in software project management, especially at the early stages of the software life cycle where the information available is more categorical than numerical. In this paper, we conducted a systematic mapping study of papers dealing with categorical data in software development effort estimation. In total, 27 papers were identified from 1997 to January 2019. The selected studies were analyzed and classified according to eight criteria: publication channels, year of publication, research approach, contribution type, SDEE technique, Technique used to handle categorical data, types of categorical data and datasets used. The results showed that most of the selected papers investigate the use of both nominal and ordinal data. Furthermore, Euclidean distance, fuzzy logic, and fuzzy clustering techniques were the most used techniques to handle categorical data using analogy. Using regression, most papers employed ANOVA and combination of categories.


  1. M. Azzeh, D. Neagu, and P. Cowling, “Software effort estimation based on weighted fuzzy grey relational analysis”, in Proc. 5th International Workshop on Predictive Models in Software Engineering, Vancouver, BC, Canada, 2009. https://doi.org/10.1145/1540438.1540450. S9*
  2. I.F. de Barcelos Tronto, J.D. S. da Silva, and N. Sant’Anna, “An investigation of artificial neural networks based prediction systems in software project management”, The Journal of Systems and Software, vol. 81, pp. 356–367, 2008. https://doi.org/10.1016/j.jss.2007.05.011. S26*
  3. B.W. Boehm, “Software cost estimation with COCOMOII”, NJ: Prentice-Hall, 2000.
  4. E. Mendes, “The use of Bayesian networks for Web effort estimation: further investigation”, in Proc. 8th Int Conf on Web Engineering, New York, 2008, pp. 203–216. https://doi.org/10.1109/ICWE.2008.16.
  5. J. Wen, S. Li, Z. Lin, Y. Huc, and C. Huang, “Systematic literature review of machine learning based software development effort estimation models”, Information and Software Technology, vol. 54, no. 1, pp. 41–59, 2012. https://doi.org/10.1016/j.infsof.2011.09.002.
  6. S.-J. Huang, N.-H. Chiu, and L.-W. Chen, “Integration of the grey relational analysis with genetic algorithm for software effort estimation”, European Journal of Operational Research, vol. 188, no. 3, pp. 898–909, 2008. https://doi.org/10.1016/j.ejor.2007.07.002.
  7. K.V. Kumar, V. Ravi, M. Carr, and N.R. Kiran, “Software development cost estimation using wavelet neural networks”, Journal of Systems and Software, vol. 81, pp.1853–1867, 2008. https://doi.org/10.1016/j.jss.2007.12.793.
  8. M.O. Elish, “Improved estimation of software project effort using multiple additive regression trees”, Expert Systems with Applications, vol. 36, no. 7, pp. 10774–10778, 2009. https://doi.org/10.1016/j.eswa.2009.02.013.
  9. M. Shepperd and C. Schofield, “Estimating software project effort using analogies”, IEEE Transactions on Software Engineering,vol. 23, no. 12, pp. 736–743, 1997. https://doi.org/10.1109/32.637387. S24*
  10. M.A. Ahmed and Z. Muzaffar, “Handling imprecision and uncertainty in software development effort prediction: a type-2 fuzzy logic based framework”, Information and Software Technology, vol. 51, no. 3, pp. 640–654, 2009. https://doi.org/10.1016/j.infsof.2008.09.004. S2*
  11. R.T. Hughes, “Expert judgment as an estimating method”, Information and Software Technology, vol. 38, pp. 67–75, 1996. https://doi.org/10.1016/0950-5849(95)01045-9.
  12. A. Idri, T. Khoshgoftaar, and A. Abran, “Investigating soft computing in case-based reasoning for software cost estimation”, Inter. Jour. of Eng. Int. Sys. for Ele. Eng. and Com., vol 10, no. 3, pp. 147-157, 2002.
  13. A. Idri, A. Abran, and L. Kjiri, “COCOMO Cost Model Using Fuzzy Logic”, in Proc. 7th International conference on Fuzzy Theory and technology, Atlantic, New Jersy, 2000, pp. 1–4.
  14. B. Boehm, “Software engineering economics”, IEEE Transactions on Software Engineering, vol. 10 pp. 4–21, 1984. https://doi.org/10.1109/TSE.1984.5010193.
  15. ISBSG, International Software Benchmark and Standard Group, www.isbsg.org.
  16. A. Idri, A. Abran, and T. Khoshgoftaar, “Fuzzy Analogy: a New Approach for Software Effort Estimation”, in Proc. 11th International Workshop in Software Measurements, Canada, 2001, pp. 93-101.
  17. F.A. Amazal, A. Idri, and A. Abran, “Improving Fuzzy Analogy Based Software Development Effort Estimation”, in Proc. 21st Asia-Pacific Software Engineering Conference, Jeju, South Korea, 1-4 Dec, 2014. https://doi.org/10.1109/APSEC.2014.46. S3*
  18. L. Angelis, I. Stamelos, and M. Morisio, “Building a Software Cost Estimation Model Based on Categorical Data”, in Proc. 7 th International Software Metrics Symposium, London, UK, 2001, pp. 4–15. https://doi.org/10.1109/METRIC.2001.915511. S4*
  19. M. Azzeh, D. Neagu, P. Cowling, “Fuzzy grey relational analysis for software effort estimation”, Empirical Software Engineering, vol. 15, no. 1, pp 60–90, 2010. https://doi.org/10.1007/s10664-009-9113-0. S8*
  20. A. Idri, F.A. Amazal, and A. Abran, “Accuracy Comparison of Analogy‐Based Software Development Effort Estimation Techniques”, International Journal of Intelligent Systems, vol. 31, no. 2, pp. 128-152, February 2016. https://doi.org/10.1002/int.21748. S15*
  21. J. Li, G. Ruhe, A. Al-Emran, and M. Richter, “A flexible method for software effort estimation by analogy”, Empirical Software Engineering, vol. 12, pp. 65–106, 2007. https://doi.org/10.1007/s10664-006-7552-4. S18*
  22. B. Kitchenham, D. Budgen, and O.P. Brereton, “The value of mapping studies – A participant-observer case study”, in Proc. 14th International Conference on Evaluation and Assessment in Software Engineering, Keele University, UK, 2010, pp. 1–9.
  23. B. Kitchenham, O.P. Brereton, D. Budgen, M. Turner, J. Bailey, and S. Linkman, “Systematic literature reviews in software engineering – A systematic literature review”, Information and Software Technology, vol.51, pp. 7–15, 2009. https://doi.org/10.1016/j.infsof.2008.09.009.
  24. B. Kitchenham, S. Charters, “Guidelines for Performing Systematic Literature Reviews in Software Engineering”, Tech. Rep. EBSE-2007- 01, Keele University and University of Durham, 2007.
  25. A. Idri, F.A. Amazal, and A. Abran, “Analogy-based software development effort estimation: A systematic mapping and review”, Information and Software Technology, vol. 58, pp.206–230, 2015. https://doi.org/10.1016/j.infsof.2014.07.013.
  26. A. Idri, M. Hosni, and A. Abran, “Systematic Mapping Study of Ensemble Effort Estimation”, in Proc. 11th International Conference on Evaluation of Novel Software Approaches to Software Engineering, 2016, pp. 132–139. https://doi.org/10.5220/0005822701320139.
  27. M. Azzeh, “Dataset Quality Assessment: An extension for analogy based effort estimation”. International Journal of Computer Science & Engineering Survey (IJCSES), vol.4, no.1, 2013. S6*
  28. M. Azzeh, “Model tree based adaption strategy for software effort estimation by analogy”, in Proc. of the 11th International Conference on Computer and Information Technology, Pafos, Cyprus, 2011. https://doi.org/10.1109/CIT.2011.48. S7*
  29. V. K. Bardsiri, D. N. A. Jawawi, S. Z. M. Hashim, and E. Khatibi, “A PSO-based model to increase the accuracy of software development effort estimation”, Software Quality Journal, vol. 21, no. 3, pp. 501-526, 2013. https://doi.org/10.1007/s11219-012-9183-x. S10*
  30. L. C. Briand, K. El Emam, D. Surmann, I. Wieczorek, and K. D. Maxwell, “An Assessment and Comparison of Common Software Cost Estimation Modeling Techniques”, in Proc. of the International Conference on Software Engineering (ICSE), Los Angeles, CA, USA , May 1999. https://doi.org/10.1145/302405.302647. S12*
  31. R. Jeffery, M. Ruhe, and I. Wieczorek, “Using Public Domain Metrics to Estimate Software Development Effort”, in Proc. 7th International Symposium on Software Metrics, April 04 - 06, 2001. https://doi.org/10.1109/METRIC.2001.915512. S16*
  32. J. W. Keung, B. Kitchenham, and D. R. Jeffery, “Analogy-X: Providing Statistical Inference to Analogy-Based Software Cost Estimation”, IEEE Transactions on Software Engineering, vol. 34, no. 4, July/August 2008. https://doi.org/10.1109/TSE.2008.34. S17*
  33. Y. F. Li, M. Xie, and T. N. Goh, “A study of the non-linear adjustment for analogy based software cost estimation”, Empirical Software Engineering, vol. 14, no. 6, pp. 603–643, December 2009. https://doi.org/10.1007/s10664-008-9104-6. S19*
  34. L. Haitao, W. Ru-xiang, and J. Guo-ping, “Similarity measurement for data with high-dimensional and mixed feature values through fuzzy clustering”, in Proc. International Conference on Computer Science and Automation Engineering (CSAE), 2012. https://doi.org/10.1109/CSAE.2012.6273028. S13*
  35. S.-J Huang, C.-Y Lin, and N.-H Chiu, “Fuzzy Decision Tree Approach for Embedding Risk Assessment Information into Software Cost Estimation Model”, Journal of Information Science and Engineering, vol. 22, pp. 297-313, 2006. S14*
  36. N. Mittas, and L. Angelis, “LSEbA: least squares regression and estimation by analogy in a semi-parametric model for software cost estimation”, Empirical Software Engineering, vol.15, pp. 523–555, 2010. https://doi.org/10.1007/s10664-010-9128-6. S20*
  37. P. Sentas, L. Angelis, I. Stamelos, and G. Bleris, “Software productivity and effort prediction with ordinal regression”, Information and Software Technology, vol. 47, pp. 17–29, 2005. https://doi.org/10.1016/j.infsof.2004.05.001. S22*
  38. M. Azzeh, “Software Effort Estimation Based on Optimized Model Tree”, in Proc. 7th International Conference on Predictive Models in Software Engineering, Banff, Alberta, Canada, September 20-21, 2011. S5*
  39. S. Bibi, I. Stamelos, and L. Angelis, “Combining probabilistic models for explanatory productivity estimation”, Information and Software Technology, vol. 50, pp. 656–669, 2008. https://doi.org/10.1016/j.infsof.2007.06.004. S11*
  40. E. Papatheocharous, and A. S. Andreou, “Classification and Prediction of Software Cost through Fuzzy Decision Trees”. in Proc. International Conference on Enterprise Information Systems, 2009, pp. 234-247. https://doi.org/10.1007/978-3-642-01347-8_20. S21*
  41. M. Tsunoda, S. Amasaki, and A. Monden, “Handling categorical variables in effort estimation”, in Proc. 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, 20-21 Sept. 2012. https://doi.org/10.1145/2372251.2372267. S27*
  42. P. Silhavy, R. Silhavy, and Z. Prokopova, “Categorical Variable Segmentation Model for Software Development Effort Estimation”, IEEE Access, vol. 7, pp. 9618 - 9626, 11 January 2019. https://doi.org/10.1109/ACCESS.2019.2891878.S25*
  43. R. Abdukalykov, I. Hussain, M. Kassab, and O. Ormandjieva, “Quantifying the Impact of Different Non-functional Requirements and Problem Domains on Software Effort Estimation”, in Proc. Ninth International Conference on Software Engineering Research, Management and Applications, Baltimore, MD, USA, 2011. https://doi.org/10.1109/SERA.2011.45. S1*
  44. Y. Shan, R. I. McKay, C.J. Lokan, and D.L. Essam, “Software Project Effort Estimation Using Genetic Programming”, in Proc. International Conference on Communications, Circuits and Systems and West Sino Expositions (ICCCAS), Chengdu, China, 29 June-1 July 2002. https://doi.org/10.1109/ICCCAS.2002.1178979. S23*
  45. A. Idri, M. Hosni, and A. Abran, “Systematic Literature Review of Ensemble Effort Estimation”, Journal of Systems and Software, vol. 118, pp. 151–175, 2016. https://doi.org/10.1016/j.jss.2016.05.016.
  46. M. Hosni, A. Idri, and A. Abran, “Investigating Heterogeneous Ensembles with Filter Feature Selection for Software Effort Estimation”, in Proc. 27th International Workshop on Software Measurement and 12th International Conference on Software Process and Product Measurement, ACM, New York, NY, USA, 2017: pp. 207–220. https://doi.org/10.1145/3143434.3143456.
  47. M. Azzeh, A.B. Nassif, and L.L. Minku, “An empirical evaluation of ensemble adjustment methods for analogy-based effort estimation”, Journal of Systems and Software, vol. 103, pp. 36–52, 2015. https://doi.org/10.1016/j.jss.2015.01.028.