Web Intrusion Detection Using Character Level Machine Learning Approaches with Upsampled Data

Talya Tümer Sivri; Nergis Pervan Akman; Ali Berkol; Can Peker

Web Intrusion Detection Using Character Level Machine Learning Approaches with Upsampled Data

Talya Tümer Sivri, Nergis Pervan Akman, Ali Berkol, Can Peker

DOI: http://dx.doi.org/10.15439/2022F147

Citation: Communication Papers of the 17th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 32, pages 269–274 (2022)

Full text

Abstract. Today, people fulfill their needs in many areas such as shopping, health, and finance online. Besides many well-meaning people who use websites for their own needs, there are also people who send attack requests to get these people's personal data, get website owners' information, and damage the application. The attack types such as SQL injection and XSS can seriously harm web applications and users. Detecting these cyber-attacks manually is very time-consuming and difficult to adapt to new attack types. Our proposed study performs attack detection using different machine learning and deep learning approaches with a larger dataset obtained by combining CSIC 2012 and ECML/PKDD datasets. In this study, we evaluated our classification results which experimented with different algorithms based on computation time and accuracy. In addition to applying different algorithms, experiments on various learning models were applied with our data upsample method for balancing the dataset labels. As a result of the binary classification, LSTM achieves the best result in terms of accuracy, and a positive effect of the upsampled data on accuracy has been observed. LightGBM was the algorithm with the highest performance in terms of computation time.

References

Isiker, B., and Sogukpinar, I. (2021). Machine learning based web application firewall. 2021 2nd International Informatics and Software Engineering Conference (IISEC). https://doi.org/10.1109/iisec54230.2021.9672335
Pham, T. S., Hoang, T. H., and Van Canh, V. (2016). Machine learning techniques for web intrusion detection — a comparison. 2016 Eighth International Conference on Knowledge and Systems Engineering (KSE). https://doi.org/10.1109/kse.2016.7758069
Li, J., Zhang, H., and Wei, Z. (2020). The weighted word2vec paragraph vectors for anomaly detection over HTTP traffic. IEEE Access, 8, 141787–141798. https://doi.org/10.1109/access.2020.3013849
Ito, M., and Iyatomi, H. (2018). Web application firewall using character-level convolutional neural network. 2018 IEEE 14th International Colloquium on Signal Processing amp; Its Applications (CSPA). https://doi.org/10.1109/cspa.2018.8368694
Bassett, G., Hylender, C. D., Langlois, P., Pinto, A., Widup, S. (2020). Verizon data breach investigations report. http://dx.doi.org/10.13140/RG.2.2.21300.48008
T. S. Pham, T. H. Hoang and V. Van Canh, "Machine learning techniques for web intrusion detection — A comparison," 2016 Eighth International Conference on Knowledge and Systems Engineering (KSE), 2016, pp. 291-297, http://dx.doi.org/10.1109/KSE.2016.7758069.
Tekerek, A., Gemci, C., and Bay, Ö. F. (2016). Web tabanlı saldırı önleme sistemi tasarımı ve gerçekleştirilmesi: yeni bir hibrit model. Gazi Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi, 31(3). https://doi.org/10.17341/gummfd.63355
Hoang, X. D. (2020). Detecting common web attacks based on machine learning using web log. Advances in Engineering Research and Application, 311–318. https://doi.org/10.1007/978-3-030-64719-3_35
Duy, P. H., Thuy, N. T. T., and Diep, N. N. (2020). Anomaly detection system of web access using user behavior features. Southeast Asian J. Sciences, 7(2).
Li, M., Wang, H., Yang, L., Liang, Y., Shang, Z., and Wan, H. (2020). Fast hybrid dimensionality reduction method for classification based on feature selection and grouped feature extraction. Expert Systems with Applications, 150, 113277. https://doi.org/10.1016/j.eswa.2020.113277
Torrano-Gimenez, C., Nguyen, H. T., Alvarez, G., Petrovic, S., and Franke, K. (2011). Applying feature selection to payload-based web application firewalls. 2011 Third International Workshop on Security and Communication Networks (IWSCN). https://doi.org/10.1109/iwscn.2011.6827720
Nguyen, H. T., Torrano-Gimenez, C., Alvarez, G., Petrović, S., and Franke, K. (2011). Application of the generic feature selection measure in detection of web attacks. Computational Intelligence in Security for Information Systems, 25–32. https://doi.org/10.1007/978-3-642-21323-6_4
Li, H., Guo, W., Wu, G., and Li, Y. (2018). A RF-PSO based hybrid feature selection model in Intrusion detection system. 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC). https://doi.org/10.1109/dsc.2018.00128
J. Gupta and J. Singh, "Detecting anomaly based network intrusion using feature extraction and classification techniques," Int. J. Adv. Res. Comput. Sci., vol. 8, no. 5, pp. 1453–1456, 2017
Liu, C., Yang, J., and Wu, J. (2020). Web intrusion detection system combined with feature analysis and SVM Optimization. EURASIP Journal on Wireless Communications and Networking, 2020(1). https://doi.org/10.1186/s13638-019-1591-1
Mimura, M. (2020). Adjusting lexical features of actual proxy logs for intrusion detection. Journal of Information Security and Applications, 50, 102408. https://doi.org/10.1016/j.jisa.2019.102408
Zhao, F., Zhang, H., Peng, J., Zhuang, X., and Na, S.-G. (2020). A semi-self-taught network intrusion detection system. Neural Computing and Applications, 32(23), 17169–17179. https://doi.org/10.1007/s00521-020-04914-7
Jemal, I., Haddar, M. A., Cheikhrouhou, O., and Mahfoudhi, A. (2021). Performance evaluation of Convolutional Neural Network for web security. Computer Communications, 175, 58–67. https://doi.org/10.1016/j.comcom.2021.04.029
Viegas, E. K., Santin, A. O., Cogo, V. V., and Abreu, V. (2020). A reliable semi-supervised Intrusion Detection Model: One year of network traffic anomalies. ICC 2020 - 2020 IEEE International Conference on Communications (ICC). https://doi.org/10.1109/icc40277.2020.9148916
H. Zhang, X. Yu, P. Ren, C. Luo and G. Min, "Deep adversarial learning in intrusion detection: A data augmentation enhanced framework", https://arxiv.org/abs/1901.07949, 2019
Yuan, D., Ota, K., Dong, M., Zhu, X., Wu, T., Zhang, L., and Ma, J. (2020). Intrusion detection for smart home security based on data augmentation with Edge Computing. ICC 2020 - 2020 IEEE International Conference on Communications (ICC). https://doi.org/10.1109/icc40277.2020.9148632
Wang, Y., lv, S., Liu, J., Chang, X., and Wang, J. (2020). On the combination of data augmentation method and gated convolution model for building effective and robust intrusion detection. Cybersecurity, 3(1). https://doi.org/10.1186/s42400-020-00063-5
Farea, A. A., Wang, C., Farea, E., and Ba Alawi, A. (2021). Cross-site scripting (XSS) and SQL injection attacks multi-classification using bidirectional LSTM recurrent neural network. 2021 IEEE International Conference on Progress in Informatics and Computing (PIC). https://doi.org/10.1109/pic53636.2021.9687064
Torpeda. (n.d.). Retrieved May 6, 2022, from https://www.tic.itefi.csic.es/torpeda/datasets.html
Analyzing web traffic ECML/PKDD 2007 discovery challenge September 17-21, 2007, Warsaw, Poland. Attack Challenge - ECML/PKDD Workshop. (n.d.). Retrieved May 6, 2022, from https://www.lirmm.fr/pkdd2007-challenge/
Bhati BS., Chugh G., Al-Turjman F., and Bhati NS. An improved ensemble based intrusion detection technique using XGBoost.Trans EmergTelecommun Technol. 2020.https://doi.org/10.1002/ett.4076
Liu J., Gao Y., and Hu F. A fast network intrusion detection system using adaptive synthetic oversampling and LightGBM. Comput. Security 2021; 106. http://dx.doi.org/10.1016/j.cose.2021.102289