Logo PTI Logo FedCSIS

Communication Papers of the 17th Conference on Computer Science and Intelligence Systems

Annals of Computer Science and Information Systems, Volume 32

Web Intrusion Detection Using Character Level Machine Learning Approaches with Upsampled Data

, , ,

DOI: http://dx.doi.org/10.15439/2022F147

Citation: Communication Papers of the 17th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 32, pages 269274 ()

Full text

Abstract. Today, people fulfill their needs in many areas such as shopping, health, and finance online. Besides many well-meaning people who use websites for their own needs, there are also people who send attack requests to get these people's personal data, get website owners' information, and damage the application. The attack types such as SQL injection and XSS can seriously harm web applications and users. Detecting these cyber-attacks manually is very time-consuming and difficult to adapt to new attack types. Our proposed study performs attack detection using different machine learning and deep learning approaches with a larger dataset obtained by combining CSIC 2012 and ECML/PKDD datasets. In this study, we evaluated our classification results which experimented with different algorithms based on computation time and accuracy. In addition to applying different algorithms, experiments on various learning models were applied with our data upsample method for balancing the dataset labels. As a result of the binary classification, LSTM achieves the best result in terms of accuracy, and a positive effect of the upsampled data on accuracy has been observed. LightGBM was the algorithm with the highest performance in terms of computation time.


  1. Isiker, B., and Sogukpinar, I. (2021). Machine learning based web application firewall. 2021 2nd International Informatics and Software Engineering Conference (IISEC). https://doi.org/10.1109/iisec54230.2021.9672335
  2. Pham, T. S., Hoang, T. H., and Van Canh, V. (2016). Machine learning techniques for web intrusion detection — a comparison. 2016 Eighth International Conference on Knowledge and Systems Engineering (KSE). https://doi.org/10.1109/kse.2016.7758069
  3. Li, J., Zhang, H., and Wei, Z. (2020). The weighted word2vec paragraph vectors for anomaly detection over HTTP traffic. IEEE Access, 8, 141787–141798. https://doi.org/10.1109/access.2020.3013849
  4. Ito, M., and Iyatomi, H. (2018). Web application firewall using character-level convolutional neural network. 2018 IEEE 14th International Colloquium on Signal Processing amp; Its Applications (CSPA). https://doi.org/10.1109/cspa.2018.8368694
  5. Bassett, G., Hylender, C. D., Langlois, P., Pinto, A., Widup, S. (2020). Verizon data breach investigations report. http://dx.doi.org/10.13140/RG.2.2.21300.48008
  6. T. S. Pham, T. H. Hoang and V. Van Canh, "Machine learning techniques for web intrusion detection — A comparison," 2016 Eighth International Conference on Knowledge and Systems Engineering (KSE), 2016, pp. 291-297, http://dx.doi.org/10.1109/KSE.2016.7758069.
  7. Tekerek, A., Gemci, C., and Bay, Ö. F. (2016). Web tabanlı saldırı önleme sistemi tasarımı ve gerçekleştirilmesi: yeni bir hibrit model. Gazi Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi, 31(3). https://doi.org/10.17341/gummfd.63355
  8. Hoang, X. D. (2020). Detecting common web attacks based on machine learning using web log. Advances in Engineering Research and Application, 311–318. https://doi.org/10.1007/978-3-030-64719-3_35
  9. Duy, P. H., Thuy, N. T. T., and Diep, N. N. (2020). Anomaly detection system of web access using user behavior features. Southeast Asian J. Sciences, 7(2).
  10. Li, M., Wang, H., Yang, L., Liang, Y., Shang, Z., and Wan, H. (2020). Fast hybrid dimensionality reduction method for classification based on feature selection and grouped feature extraction. Expert Systems with Applications, 150, 113277. https://doi.org/10.1016/j.eswa.2020.113277
  11. Torrano-Gimenez, C., Nguyen, H. T., Alvarez, G., Petrovic, S., and Franke, K. (2011). Applying feature selection to payload-based web application firewalls. 2011 Third International Workshop on Security and Communication Networks (IWSCN). https://doi.org/10.1109/iwscn.2011.6827720
  12. Nguyen, H. T., Torrano-Gimenez, C., Alvarez, G., Petrović, S., and Franke, K. (2011). Application of the generic feature selection measure in detection of web attacks. Computational Intelligence in Security for Information Systems, 25–32. https://doi.org/10.1007/978-3-642-21323-6_4
  13. Li, H., Guo, W., Wu, G., and Li, Y. (2018). A RF-PSO based hybrid feature selection model in Intrusion detection system. 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC). https://doi.org/10.1109/dsc.2018.00128
  14. J. Gupta and J. Singh, "Detecting anomaly based network intrusion using feature extraction and classification techniques," Int. J. Adv. Res. Comput. Sci., vol. 8, no. 5, pp. 1453–1456, 2017
  15. Liu, C., Yang, J., and Wu, J. (2020). Web intrusion detection system combined with feature analysis and SVM Optimization. EURASIP Journal on Wireless Communications and Networking, 2020(1). https://doi.org/10.1186/s13638-019-1591-1
  16. Mimura, M. (2020). Adjusting lexical features of actual proxy logs for intrusion detection. Journal of Information Security and Applications, 50, 102408. https://doi.org/10.1016/j.jisa.2019.102408
  17. Zhao, F., Zhang, H., Peng, J., Zhuang, X., and Na, S.-G. (2020). A semi-self-taught network intrusion detection system. Neural Computing and Applications, 32(23), 17169–17179. https://doi.org/10.1007/s00521-020-04914-7
  18. Jemal, I., Haddar, M. A., Cheikhrouhou, O., and Mahfoudhi, A. (2021). Performance evaluation of Convolutional Neural Network for web security. Computer Communications, 175, 58–67. https://doi.org/10.1016/j.comcom.2021.04.029
  19. Viegas, E. K., Santin, A. O., Cogo, V. V., and Abreu, V. (2020). A reliable semi-supervised Intrusion Detection Model: One year of network traffic anomalies. ICC 2020 - 2020 IEEE International Conference on Communications (ICC). https://doi.org/10.1109/icc40277.2020.9148916
  20. H. Zhang, X. Yu, P. Ren, C. Luo and G. Min, "Deep adversarial learning in intrusion detection: A data augmentation enhanced framework", https://arxiv.org/abs/1901.07949, 2019
  21. Yuan, D., Ota, K., Dong, M., Zhu, X., Wu, T., Zhang, L., and Ma, J. (2020). Intrusion detection for smart home security based on data augmentation with Edge Computing. ICC 2020 - 2020 IEEE International Conference on Communications (ICC). https://doi.org/10.1109/icc40277.2020.9148632
  22. Wang, Y., lv, S., Liu, J., Chang, X., and Wang, J. (2020). On the combination of data augmentation method and gated convolution model for building effective and robust intrusion detection. Cybersecurity, 3(1). https://doi.org/10.1186/s42400-020-00063-5
  23. Farea, A. A., Wang, C., Farea, E., and Ba Alawi, A. (2021). Cross-site scripting (XSS) and SQL injection attacks multi-classification using bidirectional LSTM recurrent neural network. 2021 IEEE International Conference on Progress in Informatics and Computing (PIC). https://doi.org/10.1109/pic53636.2021.9687064
  24. Torpeda. (n.d.). Retrieved May 6, 2022, from https://www.tic.itefi.csic.es/torpeda/datasets.html
  25. Analyzing web traffic ECML/PKDD 2007 discovery challenge September 17-21, 2007, Warsaw, Poland. Attack Challenge - ECML/PKDD Workshop. (n.d.). Retrieved May 6, 2022, from https://www.lirmm.fr/pkdd2007-challenge/
  26. Bhati BS., Chugh G., Al-Turjman F., and Bhati NS. An improved ensemble based intrusion detection technique using XGBoost.Trans EmergTelecommun Technol. 2020.https://doi.org/10.1002/ett.4076
  27. Liu J., Gao Y., and Hu F. A fast network intrusion detection system using adaptive synthetic oversampling and LightGBM. Comput. Security 2021; 106. http://dx.doi.org/10.1016/j.cose.2021.102289