An incremental malware detection model for meta-feature API and system call sequence
Pushkar Kishore, Swadhin Kumar Barisal, Durga Prasad Mohapatra
DOI: http://dx.doi.org/10.15439/2020F73
Citation: Proceedings of the 2020 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 21, pages 629–638 (2020)
Abstract. In this technical world, the detection of malware variants is getting cumbersome day by day. Newer variants of malware make it even tougher to detect them. The enormous amount of diversified malware enforced us to stumble on new techniques like machine learning. In this work, we propose an incremental malware detection model for meta-feature API and system call sequence. We represent the host behaviour using a sequence of API calls and system calls. For the creation of sequential system calls, we use NITRSCT (NITR System call Tracer) and for sequential API calls, we generate a list of anomaly scores for each API call sequence using Numenta Hierarchical Temporal Memory (N-HTM). We have converted the API call sequence into six meta-features that narrates its influence. We do the feature selection using a correlation matrix with a heatmap to select the best meta-features. An incremental malware detection model is proposed to decide the label of the binary executable under study. We classify malware samples into their respective types and demonstrated via a case study that, our proposed model can reduce the effort required in STS-Tool(Socio-Technical Security Tool) approach and Abuse case. Theoretical analysis and real-life experiments show that our model is efficient and achieves 95.2\% accuracy. The detection speed of our proposed model is 0.03s. We resolve the issue of limited precision and recall while detecting malware. User's requirement is also met by fixing the trade-off between accuracy and speed.
References
- P. Royal, M. Halpin, D. Dagon, R. Edmonds, and W. Lee, “Polyunpack: Automating the hidden-code extraction of unpack-executing malware,” in 2006 22nd Annual Computer Security Applications Conference (ACSAC’06). IEEE, 2006, pp. 289–300. [Online]. Available: https://doi.org/10.1109/acsac.2006.38
- K. Yan, Z. Ji, and W. Shen, “Online fault detection methods for chillers combining extended kalman filter and recursive one-class svm,” Neurocomputing, vol. 228, pp. 205–212, 2017. [Online]. Available: https://doi.org/10.1016/j.neucom.2016.09.076
- C. S. Sharma, S. N. Panda, R. P. Pradhan, A. Singh, and A. Kawamura, “Precipitation and temperature changes in eastern india by multiple trend detection methods,” Atmospheric research, vol. 180, pp. 211–225, 2016. [Online]. Available: https://doi.org/10.1016/j.atmosres.2016.04.019
- A. K. Chanda, C. F. Ahmed, M. Samiullah, and C. K. Leung, “A new framework for mining weighted periodic patterns in time series databases,” Expert Systems with Applications, vol. 79, pp. 207–224, 2017. [Online]. Available: https://doi.org/10.1016/j.eswa.2017.02.028
- Z. Ji, B. Wang, S. Deng, and Z. You, “Predicting dynamic deformation of retaining structure by lssvr-based time series method,” Neurocomputing, vol. 137, pp. 165–172, 2014. [Online]. Available: https://doi.org/10.1016/j.neucom.2013.03.073
- S. Ahmad, A. Lavin, S. Purdy, and Z. Agha, “Unsupervised real-time anomaly detection for streaming data,” Neurocomputing, vol. 262, pp. 134–147, 2017. [Online]. Available: https://doi.org/10.1016/j.neucom.2017.04.070
- P. Kishore, S. K. Barisal, and S. Vaish, “Nitrsct: A software security tool for collection and analysis of kernel calls,” in TENCON 2019-2019 IEEE Region 10 Conference (TENCON). IEEE, 2019, pp. 510–515. [Online]. Available: https://doi.org/10.1109/tencon.2019.8929513
- G. McGraw, “Software security,” IEEE Security & Privacy, vol. 2, no. 2, pp. 80–83, 2004. [Online]. Available: https://doi.org/10.1109/msecp.2004.1281254
- E. Paja, F. Dalpiaz, M. Poggianella, P. Roberti, and P. Giorgini, “Sts-tool: socio-technical security requirements through social commitments,” in 2012 20th IEEE International Requirements Engineering Conference (RE). IEEE, 2012, pp. 331–332. [Online]. Available: https://doi.org/10.1109/re.2012.6345830
- W. Wang, Z. Gao, M. Zhao, Y. Li, J. Liu, and X. Zhang, “Droidensemble: Detecting android malicious applications with ensemble of string and structural static features,” IEEE Access, vol. 6, pp. 31 798–31 807, 2018. [Online]. Available: https://doi.org/10.1109/access.2018.2835654
- C. K. Patanaik, F. A. Barbhuiya, and S. Nandi, “Obfuscated malware detection using api call dependency,” in Proceedings of the First International Conference on Security of Internet of Things. ACM, 2012, pp. 185–193. [Online]. Available: https://doi.org/10.1145/2490428.2490454
- J. Huang, X. Zhang, L. Tan, P. Wang, and B. Liang, “Asdroid: Detecting stealthy behaviors in android applications by user interface and program behavior contradiction,” in Proceedings of the 36th International Conference on Software Engineering. ACM, 2014, pp. 1036–1046. [Online]. Available: https://doi.org/10.1145/2568225.2568301
- M. Fan, J. Liu, X. Luo, K. Chen, Z. Tian, Q. Zheng, and T. Liu, “Android malware familial classification and representative sample selection via frequent subgraph analysis,” IEEE Transactions on Information Forensics and Security, vol. 13, no. 8, pp. 1890–1905, 2018. [Online]. Available: https://doi.org/10.1109/tifs.2018.2806891
- R. Canzanese, S. Mancoridis, and M. Kam, “System call-based detection of malicious processes,” in 2015 IEEE International Conference on Software Quality, Reliability and Security. IEEE, 2015, pp. 119–124. [Online]. Available: https://doi.org/10.1109/qrs.2015.26
- J. Zhang, Z. Qin, K. Zhang, H. Yin, and J. Zou, “Dalvik opcode graph based android malware variants detection using global topology features,” IEEE Access, vol. 6, pp. 51 964–51 974, 2018. [Online]. Available: https://doi.org/10.1109/access.2018.2870534
- E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, and C. K. Nicholas, “Malware detection by eating a whole exe,” in Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
- B. Kang, S. Y. Yerima, K. McLaughlin, and S. Sezer, “N-opcode analysis for android malware classification and categorization,” in 2016 International Conference On Cyber Security And Protection Of Digital Services (Cyber Security). IEEE, 2016, pp. 1–7. [Online]. Available: https://doi.org/10.1109/cybersecpods.2016.7502343
- J. G. de la Puerta, B. Sanz, I. Santos, and P. G. Bringas, “Using dalvik opcodes for malware detection on android,” in International Conference on Hybrid Artificial Intelligence Systems. Springer, 2015, pp. 416–426. [Online]. Available: https://doi.org/10.1093/jigpal/jzx031
- E. Garoudja, F. Harrou, Y. Sun, K. Kara, A. Chouder, and S. Silvestre, “Statistical fault detection in photovoltaic systems,” Solar Energy, vol. 150, pp. 485–499, 2017. [Online]. Available: https://doi.org/10.1016/j.solener.2017.04.043
- L. Dong, L. Shulin, and H. Zhang, “A method of anomaly detection and fault diagnosis with online adaptive learning under small training samples,” Pattern Recognition, vol. 64, pp. 374–385, 2017. [Online]. Available: https://doi.org/10.1016/j.patcog.2016.11.026
- J. C. M. Oliveira, K. V. Pontes, I. Sartori, and M. Embiruçu, “Fault detection and diagnosis in dynamic systems using weightless neural networks,” Expert Systems with Applications, vol. 84, pp. 200–219, 2017. [Online]. Available: https://doi.org/10.1016/j.eswa.2017.05.020
- M. Gan, C. P. Chen, H.-X. Li, and L. Chen, “Gradient radial basis function based varying-coefficient autoregressive model for nonlinear and nonstationary time series,” IEEE Signal Processing Letters, vol. 22, no. 7, pp. 809–812, 2014. [Online]. Available: https://doi.org/10.1109/lsp.2014.2369415
- S. Kanarachos, S.-R. G. Christopoulos, A. Chroneos, and M. E. Fitzpatrick, “Detecting anomalies in time series data via a deep learning algorithm combining wavelets, neural networks and hilbert transform,” Expert Systems with Applications, vol. 85, pp. 292–304, 2017. [Online]. Available: https://doi.org/10.1016/j.eswa.2017.04.028
- M. K. Cain, Z. Zhang, and K.-H. Yuan, “Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation,” Behavior Research Methods, vol. 49, no. 5, pp. 1716–1735, 2017. [Online]. Available: https://doi.org/10.3758/s13428-016-0814-1
- G. R. Iannotti, F. Pittau, C. M. Michel, S. Vulliemoz, and F. Grouiller, “Pulse artifact detection in simultaneous eeg–fmri recording based on eeg map topography,” Brain topography, vol. 28, no. 1, pp. 21–32, 2015. [Online]. Available: https://doi.org/10.1007/s10548-014-0409-z
- K. N. Rajesh and R. Dhuli, “Classification of ecg heartbeats using nonlinear decomposition methods and support vector machine,” Computers in biology and medicine, vol. 87, pp. 271–284, 2017. [Online]. Available: https://doi.org/10.1016/j.compbiomed.2017.06.006
- R. K. Tripathy, S. Deb, and S. Dandapat, “Analysis of physiological signals using state space correlation entropy,” Healthcare technology letters, vol. 4, no. 1, pp. 30–33, 2017. [Online]. Available: https://doi.org/10.1049/htl.2016.0065
- P. Marwaha and R. K. Sunkaria, “Complexity quantification of cardiac variability time series using improved sample entropy (i-sampen),” Australasian physical & engineering sciences in medicine, vol. 39, no. 3, pp. 755–763, 2016. [Online]. Available: https://doi.org/10.1007/s13246-016-0457-7
- A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,” Statistics and computing, vol. 14, no. 3, pp. 199–222, 2004. [Online]. Available: https://doi.org/10.1023/b:stco.0000035301.49549.88