Data Mining-Based Phishing Detection
Jan Bohacik, Ivan Skula, Michal Zabovsky
DOI: http://dx.doi.org/10.15439/2020F140
Citation: Proceedings of the 2020 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 21, pages 27–30 (2020)
Abstract. Webpages can be faked easily nowadays and as there are many internet users, it is not hard to find some becoming victims of them. Simultaneously, it is not uncommon these days that more and more activities such as banking and shopping are being moved to the internet, which may lead to huge financial losses. In this paper, a developed Chrome plugin for data mining-based detection of phishing webpages is described. The plugin is written in JavaScript and it uses a C4.5 decision tree model created on the basis of collected data with eight describing attributes. The usability of the model is validated with 10-fold cross-validation and the computation of sensitivity, specificity and overall accuracy. The achieved results of experiments are promising.
References
- Anti-Phishing Working Group, Phishing Activity Trends Report, 1st Quarter 2020. USA: Anti-Phishing Working Group, 2020, https://docs.apwg.org/reports/apwg_trends_report_q1_2020.pdf.
- E. d. Argaez, Internet Usage Statistics: The Internet Big Picture, Bogota, Colombia: Internet World Stats, 2020, https://www. internetworldstats.com/stats.htm.
- D. Dua and C. Graff, UCI Machine Learning Repository, USA: University of California, School of Information and Computer Science, 2019, http://archive.ics.uci.edu/ml.
- T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. : Springer-Verlag, 2009, https://dx.doi.org/10.1007/978-0-387-84858-7.
- M. Karabatak, T. Mustafa, “Performance comparison of classifiers on reduced phishing website dataset,” in Proc. of the International Symposium on Digital Forensic and Security, IEEE, Turkey, 2018, pp. 1-5, https://dx.doi.org/10.1109/ISDFS.2018. 8355357.
- B. M. Lawrence, How to Make Fake Web Pages. : Techwalla, 2020, https://www.techwalla.com/articles/how-to-make-fake-web-pages.
- K. Pancerz, V. Levashenko, E. Zaitseva, J. Gomuła, “Experiments with classification of MMPI profiles using fuzzy decision trees,” in Proc. of the Federated Conference on Computer Science and Information Systems, IEEE, Poland, 2018, pp. 125-128, https://dx.doi.org/10.15439/2018F111.
- S. Patil, S. Dhage, “A methodical overview on phishing detection along with an organized way to construct an anti-phishing framework,” in Proc. of the International Conference on Advanced Computing & Communication Systems, IEEE, India, 2019, pp. 588-593, https://dx.doi.org/10.1109/ICACCS.2019.8728356.
- Y. Pristyanto, A. Dahlan, “Hybrid resampling for imbalanced class handling on web phishing classification dataset,” in Proc. of the International Conference on Information Technology, Information Systems and Electrical Engineering, IEEE, Indonesia, 2019, pp. 401-406, https://dx.doi.org/10.1109/ICITISEE48480.2019.9003803.
- J. Sonmez, How to Create a Chrome Extension in 10 Minutes Flat. Australia: sitepoint, 2015, https://www.sitepoint.com/create-chrome-extension-10-minutes-flat/.
- Statista, Digital Payments: Worldwide. Germany: Statista, 2020, https://www.statista.com/outlook/296/100/digital-payments/worldwide.
- S. V. Stehman, “Selecting and interpreting measures of thematic classification accuracy”, Remote Sensing of Environment, vol. 62, no. 1, pp. 77–89, 1997, https://dx.doi.org/10.1016/S0034-4257(97)00083-7.
- L. Wenyin, G. Huang, L. Xiaoyue, X. Deng, and Z. Min, “Phishing web page detection,” in Proc. of the International Conference on Document Analysis and Recognition, IEEE, South Korea, 2005, pp. 560–564, https://dx.doi.org/10.1109/ICDAR. 2005.190.
- R. Wahyudi, H. Marcos, U. Hasanah, B. P. Hartato, T. Astuti, R. A. Prasetyo, “Algorithm evaluation for classification ‘phishing website’ using several classification algorithms”, in Proc. of the International Conference on Information Technology, Information Systems and Electrical Engineering, IEEE, Indonesia, 2018, pp. 265-270, https://dx.doi.org/10.1109/ICITISEE.2018.8720975.
- I. H. Witten, E. Frank, M. A. Hall, C. J. Pal, Data Mining: Practical Machine Learning Tools and Techniques. USA: Morgan Kaufmann, 2017, https://dx.doi.org/10.1016/C2015-0-02071-8.