Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 21

Proceedings of the 2020 Federated Conference on Computer Science and Information Systems

An incremental malware detection model for meta-feature API and system call sequence

, ,

DOI: http://dx.doi.org/10.15439/2020F73

Citation: Proceedings of the 2020 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 21, pages 629638 ()

Full text

Abstract. In this technical world, the detection of malware variants is getting cumbersome day by day. Newer variants of malware make it even tougher to detect them. The enormous amount of diversified malware enforced us to stumble on new techniques like machine learning. In this work, we propose an incremental malware detection model for meta-feature API and system call sequence. We represent the host behaviour using a sequence of API calls and system calls. For the creation of sequential system calls, we use NITRSCT (NITR System call Tracer) and for sequential API calls, we generate a list of anomaly scores for each API call sequence using Numenta Hierarchical Temporal Memory (N-HTM). We have converted the API call sequence into six meta-features that narrates its influence. We do the feature selection using a correlation matrix with a heatmap to select the best meta-features. An incremental malware detection model is proposed to decide the label of the binary executable under study. We classify malware samples into their respective types and demonstrated via a case study that, our proposed model can reduce the effort required in STS-Tool(Socio-Technical Security Tool) approach and Abuse case. Theoretical analysis and real-life experiments show that our model is efficient and achieves 95.2\% accuracy. The detection speed of our proposed model is 0.03s. We resolve the issue of limited precision and recall while detecting malware. User's requirement is also met by fixing the trade-off between accuracy and speed.

References

  1. P. Royal, M. Halpin, D. Dagon, R. Edmonds, and W. Lee, “Polyunpack: Automating the hidden-code extraction of unpack-executing malware,” in 2006 22nd Annual Computer Security Applications Conference (ACSAC’06). IEEE, 2006, pp. 289–300. [Online]. Available: https://doi.org/10.1109/acsac.2006.38
  2. K. Yan, Z. Ji, and W. Shen, “Online fault detection methods for chillers combining extended kalman filter and recursive one-class svm,” Neurocomputing, vol. 228, pp. 205–212, 2017. [Online]. Available: https://doi.org/10.1016/j.neucom.2016.09.076
  3. C. S. Sharma, S. N. Panda, R. P. Pradhan, A. Singh, and A. Kawamura, “Precipitation and temperature changes in eastern india by multiple trend detection methods,” Atmospheric research, vol. 180, pp. 211–225, 2016. [Online]. Available: https://doi.org/10.1016/j.atmosres.2016.04.019
  4. A. K. Chanda, C. F. Ahmed, M. Samiullah, and C. K. Leung, “A new framework for mining weighted periodic patterns in time series databases,” Expert Systems with Applications, vol. 79, pp. 207–224, 2017. [Online]. Available: https://doi.org/10.1016/j.eswa.2017.02.028
  5. Z. Ji, B. Wang, S. Deng, and Z. You, “Predicting dynamic deformation of retaining structure by lssvr-based time series method,” Neurocomputing, vol. 137, pp. 165–172, 2014. [Online]. Available: https://doi.org/10.1016/j.neucom.2013.03.073
  6. S. Ahmad, A. Lavin, S. Purdy, and Z. Agha, “Unsupervised real-time anomaly detection for streaming data,” Neurocomputing, vol. 262, pp. 134–147, 2017. [Online]. Available: https://doi.org/10.1016/j.neucom.2017.04.070
  7. P. Kishore, S. K. Barisal, and S. Vaish, “Nitrsct: A software security tool for collection and analysis of kernel calls,” in TENCON 2019-2019 IEEE Region 10 Conference (TENCON). IEEE, 2019, pp. 510–515. [Online]. Available: https://doi.org/10.1109/tencon.2019.8929513
  8. G. McGraw, “Software security,” IEEE Security & Privacy, vol. 2, no. 2, pp. 80–83, 2004. [Online]. Available: https://doi.org/10.1109/msecp.2004.1281254
  9. E. Paja, F. Dalpiaz, M. Poggianella, P. Roberti, and P. Giorgini, “Sts-tool: socio-technical security requirements through social commitments,” in 2012 20th IEEE International Requirements Engineering Conference (RE). IEEE, 2012, pp. 331–332. [Online]. Available: https://doi.org/10.1109/re.2012.6345830
  10. W. Wang, Z. Gao, M. Zhao, Y. Li, J. Liu, and X. Zhang, “Droidensemble: Detecting android malicious applications with ensemble of string and structural static features,” IEEE Access, vol. 6, pp. 31 798–31 807, 2018. [Online]. Available: https://doi.org/10.1109/access.2018.2835654
  11. C. K. Patanaik, F. A. Barbhuiya, and S. Nandi, “Obfuscated malware detection using api call dependency,” in Proceedings of the First International Conference on Security of Internet of Things. ACM, 2012, pp. 185–193. [Online]. Available: https://doi.org/10.1145/2490428.2490454
  12. J. Huang, X. Zhang, L. Tan, P. Wang, and B. Liang, “Asdroid: Detecting stealthy behaviors in android applications by user interface and program behavior contradiction,” in Proceedings of the 36th International Conference on Software Engineering. ACM, 2014, pp. 1036–1046. [Online]. Available: https://doi.org/10.1145/2568225.2568301
  13. M. Fan, J. Liu, X. Luo, K. Chen, Z. Tian, Q. Zheng, and T. Liu, “Android malware familial classification and representative sample selection via frequent subgraph analysis,” IEEE Transactions on Information Forensics and Security, vol. 13, no. 8, pp. 1890–1905, 2018. [Online]. Available: https://doi.org/10.1109/tifs.2018.2806891
  14. R. Canzanese, S. Mancoridis, and M. Kam, “System call-based detection of malicious processes,” in 2015 IEEE International Conference on Software Quality, Reliability and Security. IEEE, 2015, pp. 119–124. [Online]. Available: https://doi.org/10.1109/qrs.2015.26
  15. J. Zhang, Z. Qin, K. Zhang, H. Yin, and J. Zou, “Dalvik opcode graph based android malware variants detection using global topology features,” IEEE Access, vol. 6, pp. 51 964–51 974, 2018. [Online]. Available: https://doi.org/10.1109/access.2018.2870534
  16. E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, and C. K. Nicholas, “Malware detection by eating a whole exe,” in Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
  17. B. Kang, S. Y. Yerima, K. McLaughlin, and S. Sezer, “N-opcode analysis for android malware classification and categorization,” in 2016 International Conference On Cyber Security And Protection Of Digital Services (Cyber Security). IEEE, 2016, pp. 1–7. [Online]. Available: https://doi.org/10.1109/cybersecpods.2016.7502343
  18. J. G. de la Puerta, B. Sanz, I. Santos, and P. G. Bringas, “Using dalvik opcodes for malware detection on android,” in International Conference on Hybrid Artificial Intelligence Systems. Springer, 2015, pp. 416–426. [Online]. Available: https://doi.org/10.1093/jigpal/jzx031
  19. E. Garoudja, F. Harrou, Y. Sun, K. Kara, A. Chouder, and S. Silvestre, “Statistical fault detection in photovoltaic systems,” Solar Energy, vol. 150, pp. 485–499, 2017. [Online]. Available: https://doi.org/10.1016/j.solener.2017.04.043
  20. L. Dong, L. Shulin, and H. Zhang, “A method of anomaly detection and fault diagnosis with online adaptive learning under small training samples,” Pattern Recognition, vol. 64, pp. 374–385, 2017. [Online]. Available: https://doi.org/10.1016/j.patcog.2016.11.026
  21. J. C. M. Oliveira, K. V. Pontes, I. Sartori, and M. Embiruçu, “Fault detection and diagnosis in dynamic systems using weightless neural networks,” Expert Systems with Applications, vol. 84, pp. 200–219, 2017. [Online]. Available: https://doi.org/10.1016/j.eswa.2017.05.020
  22. M. Gan, C. P. Chen, H.-X. Li, and L. Chen, “Gradient radial basis function based varying-coefficient autoregressive model for nonlinear and nonstationary time series,” IEEE Signal Processing Letters, vol. 22, no. 7, pp. 809–812, 2014. [Online]. Available: https://doi.org/10.1109/lsp.2014.2369415
  23. S. Kanarachos, S.-R. G. Christopoulos, A. Chroneos, and M. E. Fitzpatrick, “Detecting anomalies in time series data via a deep learning algorithm combining wavelets, neural networks and hilbert transform,” Expert Systems with Applications, vol. 85, pp. 292–304, 2017. [Online]. Available: https://doi.org/10.1016/j.eswa.2017.04.028
  24. M. K. Cain, Z. Zhang, and K.-H. Yuan, “Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation,” Behavior Research Methods, vol. 49, no. 5, pp. 1716–1735, 2017. [Online]. Available: https://doi.org/10.3758/s13428-016-0814-1
  25. G. R. Iannotti, F. Pittau, C. M. Michel, S. Vulliemoz, and F. Grouiller, “Pulse artifact detection in simultaneous eeg–fmri recording based on eeg map topography,” Brain topography, vol. 28, no. 1, pp. 21–32, 2015. [Online]. Available: https://doi.org/10.1007/s10548-014-0409-z
  26. K. N. Rajesh and R. Dhuli, “Classification of ecg heartbeats using nonlinear decomposition methods and support vector machine,” Computers in biology and medicine, vol. 87, pp. 271–284, 2017. [Online]. Available: https://doi.org/10.1016/j.compbiomed.2017.06.006
  27. R. K. Tripathy, S. Deb, and S. Dandapat, “Analysis of physiological signals using state space correlation entropy,” Healthcare technology letters, vol. 4, no. 1, pp. 30–33, 2017. [Online]. Available: https://doi.org/10.1049/htl.2016.0065
  28. P. Marwaha and R. K. Sunkaria, “Complexity quantification of cardiac variability time series using improved sample entropy (i-sampen),” Australasian physical & engineering sciences in medicine, vol. 39, no. 3, pp. 755–763, 2016. [Online]. Available: https://doi.org/10.1007/s13246-016-0457-7
  29. A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,” Statistics and computing, vol. 14, no. 3, pp. 199–222, 2004. [Online]. Available: https://doi.org/10.1023/b:stco.0000035301.49549.88