Logo PTI Logo FedCSIS

Proceedings of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS)

Annals of Computer Science and Information Systems, Volume 39

Model-Agnostic Machine Learning Model Updating – A Case Study on a real-world Application

, ,

DOI: http://dx.doi.org/10.15439/2024F4426

Citation: Proceedings of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS), M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 39, pages 157167 ()

Full text

Abstract. The application of developments in the real world is the final aim of all scientific works. In the case of Data Science and Machine Learning, this means there are additional tasks to care about, compared to the rather academic part of ``just'' building a model based on the available data. In the well accepted Cross Industry Standard for Data Mining (CRISP-DM), one of these tasks is the maintenance of the deployed application. This task can be of extreme importance, since in real-world applications the model performance often decreases over time, usually due to Concept Drift. This directly leads to the need to adapt/update the used Machine Learning model. In this work, available model-agnostic model update methods are evaluated on a real-world industry application, here Virtual Metrology in semiconductor fabrication. The results show that for the real-world use case sliding window techniques performed best. The models used in the experiments were an XGBoost and Neural Network. For the Neural Network, Model-Agnostic Meta-Learning and Learning to learn by Gradient Descent by Gradient Descent were applied as update techniques (among others) and did not show any improvement compared to the baseline of not updating the Neural Network. The implementation of the update techniques was validated on an artificial use case for which they worked well.

References

  1. V. Maitra, Y. Su, and J. Shi, “Virtual metrology in semiconductor manufacturing: Current status and future prospects,” Expert Systems with Applications, vol. 249, p. 123559, 2024. http://dx.doi.org/10.1016/j.eswa.2024.123559
  2. S. Yan, C. Luo, S. Wang, S. Ding, L. Li, J. Ai, Q. Sheng, Q. Xia, Z. Li, Q. Chen, S. Li, H. Dai, and Y. Zhong, “Virtual metrology modeling for cvd film thickness with lasso-gaussian process regression,” in 2023 China Semiconductor Technology International Conference (CSTIC), 2023. http://dx.doi.org/10.1109/CSTIC58779.2023.10219236 pp. 1–4.
  3. C. Schröer, F. Kruse, and J. M. Gómez, “A systematic literature review on applying crisp-dm process model,” Procedia Computer Science, vol. 181, pp. 526–534, 2021. http://dx.doi.org/10.1016/j.procs.2021.01.199
  4. F. Bayram, B. S. Ahmed, and A. Kassler, “From concept drift to model degradation: An overview on performance-aware drift detectors,” Knowledge-Based Systems, vol. 245, p. 1, 2022. doi: 10.1016/j.knosys.2022.108632
  5. A. L. Suárez-Cetrulo, D. Quintana, and A. Cervantes, “A survey on machine learning for recurring concept drifting data streams,” Expert Systems with Applications, vol. 213, p. 118934, 2023. http://dx.doi.org/10.1016/j.eswa.2022.118934. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0957417422019522
  6. J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia, “A survey on concept drift adaptation,” ACM Computing Surveys, vol. 46, no. 4, pp. 1–37, Mar. 2014. http://dx.doi.org/10.1145/2523813
  7. J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang, “Learning under concept drift: A review,” IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 12, pp. 2346–2363, 2019. http://dx.doi.org/10.1109/TKDE.2018.2876857
  8. A. Choudhary, P. Jha, A. Tiwari, and N. Bharill, “A brief survey on concept drifted data stream regression,” in Soft Computing for Problem Solving, A. Tiwari, K. Ahuja, A. Yadav, J. C. Bansal, K. Deep, and A. K. Nagar, Eds. Singapore: Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-16-2712-5_57 pp. 733–744.
  9. A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and A. A. Bharath, “Generative adversarial networks: An overview,” IEEE Signal Processing Magazine, vol. 35, no. 1, pp. 53–65, 2018. http://dx.doi.org/10.1109/MSP.2017.2765202
  10. Y. Song, G. Zhang, J. Lu, and H. Lu, “A fuzzy kernel c-means clustering model for handling concept drift in regression,” in 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2017. http://dx.doi.org/10.1109/FUZZ-IEEE.2017.8015515 pp. 1–6.
  11. D. Liu, Y. Wu, and H. Jiang, “Fp-elm: An online sequential learning algorithm for dealing with concept drift,” Neurocomputing, vol. 207, pp. 322–334, 2016. http://dx.doi.org/10.1016/j.neucom.2016.04.043
  12. S. J. Delany, P. Cunningham, A. Tsymbal, and L. Coyle, “A case-based technique for tracking concept drift in spam filtering,” in Applications and Innovations in Intelligent Systems XII, A. Macintosh, R. Ellis, and T. Allen, Eds. London: Springer London, 2005. http://dx.doi.org/10.1007/1-84628-103-2_1. ISBN 978-1-84628-103-7 pp. 3–16.
  13. Y. Song, G. Zhang, H. Lu, and J. Lu, “A noise-tolerant fuzzy c-means based drift adaptation method for data stream regression,” in 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2019. http://dx.doi.org/10.1109/FUZZ-IEEE.2019.8859005 pp. 1–6.
  14. J. Vanschoren, “Meta-learning: A survey,” arXiv preprint https://arxiv.org/abs/1810.03548, 2018. http://dx.doi.org/10.48550/arXiv.1810.03548
  15. J. Son, S. Lee, and G. Kim, “When meta-learning meets online and continual learning: A survey,” 2023. http://dx.doi.org/10.48550/arXiv.2311.05241
  16. A. Nagabandi, I. Clavera, S. Liu, R. S. Fearing, P. Abbeel, S. Levine, and C. Finn, “Learning to adapt in dynamic, real-world environments through meta-reinforcement learning,” 2019.
  17. S. Lee, H. Jeon, J. Son, and G. Kim, “Sequential bayesian continual learning with meta-learned neural networks,” 2024. [Online]. Available: https://openreview.net/forum?id=6r0BOIb771
  18. J. von Oswald, C. Henning, B. F. Grewe, and J. Sacramento, “Continual learning with hypernetworks,” 2022. http://dx.doi.org/10.48550/arXiv.1906.00695
  19. K. Li and J. Malik, “Learning to optimize,” 2016. http://dx.doi.org/10.48550/arXiv.1606.01885
  20. H. M. Gomes, J. P. Barddal, L. E. B. Ferreira, and A. Bifet, “Adaptive random forests for data stream regression.” in European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), 2018. [Online]. Available: https: //www.ppgia.pucpr.br/~jean.barddal/assets/pdf/arf_regression.pdf
  21. J. Montiel, R. Mitchell, E. Frank, B. Pfahringer, T. Abdessalem, and A. Bifet, “Adaptive xgboost for evolving data streams,” in 2020 International Joint Conference on Neural Networks (IJCNN), 2020. http://dx.doi.org/10.1109/IJCNN48605.2020.9207555 pp. 1–8.
  22. F. M. de Souza, J. Grando, and F. Baldo, “Adaptive fast xgboost for regression,” in Intelligent Systems, J. C. Xavier-Junior and R. A. Rios, Eds. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-21686-2_7. ISBN 978-3-031-21686-2 pp. 92–106.
  23. J. Zheng, F. Shen, H. Fan, and J. Zhao, “An online incremental learning support vector machine for large-scale data,” Neural Computing and Applications, vol. 22, pp. 1023–1035, 2013. http://dx.doi.org/10.1007/s00521-011-0793-1
  24. Łukasz Korycki and B. Krawczyk, “Adaptive deep forest for online learning from drifting data streams,” 2020. http://dx.doi.org/10.48550/arXiv.2010.07340
  25. A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap, “One-shot learning with memory-augmented neural networks,” 2016. http://dx.doi.org/10.48550/arXiv.1605.06065
  26. S. Xu and J. Wang, “Dynamic extreme learning machine for data stream classification,” Neurocomputing, vol. 238, pp. 433–449, 2017. http://dx.doi.org/10.1016/j.neucom.2016.12.078
  27. N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel, “A simple neural attentive meta-learner,” 2018. http://dx.doi.org/10.48550/arXiv.1707.03141
  28. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, “Models and issues in data stream systems,” in Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, ser. PODS ’02. New York, NY, USA: Association for Computing Machinery, 2002. http://dx.doi.org/10.1145/543613.543615. ISBN 1581135076 p. 1–16.
  29. R. Klinkenberg, “Learning drifting concepts: Example selection vs. example weighting,” Intell. Data Anal., vol. 8, pp. 281–300, 2004. http://dx.doi.org/10.3233/IDA-2004-8305
  30. M. Andrychowicz, M. Denil, S. Gómez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. de Freitas, “Learning to learn by gradient descent by gradient descent,” in Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, Eds., vol. 29. Curran Associates, Inc., 2016. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2016/file/fb87582825f9d28a8d42c5e5e5e8b23d-Paper.pdf
  31. N. Oza, “Online bagging and boosting,” in 2005 IEEE International Conference on Systems, Man and Cybernetics, vol. 3, 2005. doi: 10.1109/ICSMC.2005.1571498 pp. 2340–2345 Vol. 3.
  32. E. Lughofer, “Efficient sample selection in data stream regression employing evolving generalized fuzzy models,” in 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2015. http://dx.doi.org/10.1109/FUZZ-IEEE.2015.7337844 pp. 1–9.
  33. Y. Song, G. Zhang, J. Lu, and H. Lu, “A fuzzy kernel c-means clustering model for handling concept drift in regression,” in 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2017. http://dx.doi.org/10.1109/FUZZ-IEEE.2017.8015515 pp. 1–6.
  34. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, p. 1735–1780, nov 1997. http://dx.doi.org/10.1162/neco.1997.9.8.1735
  35. L. Deng, “The mnist database of handwritten digit images for machine learning research,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141–142, 2012. http://dx.doi.org/10.1109/MSP.2012.2211477
  36. A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009. [Online]. Available: https://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf
  37. C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proceedings of the 34th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, D. Precup and Y. W. Teh, Eds., vol. 70. PMLR, 06–11 Aug 2017, pp. 1126–1135. [Online]. Available: https://proceedings.mlr.press/v70/finn17a.html
  38. Y. Chen, M. W. Hoffman, S. G. Colmenarejo, M. Denil, T. P. Lillicrap, M. Botvinick, and N. de Freitas, “Learning to learn without gradient descent by gradient descent,” in Proceedings of the 34th International Conference on Machine Learning - Volume 70, ser. ICML’17. JMLR.org, 2017. http://dx.doi.org/10.5555/3305381.3305459 p. 748–756.
  39. W. N. Street and Y. Kim, “A streaming ensemble algorithm (sea) for large-scale classification,” in Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’01. New York, NY, USA: Association for Computing Machinery, 2001. http://dx.doi.org/10.1145/502512.502568. ISBN 158113391X p. 377–382.
  40. J. Z. Kolter and M. A. Maloof, “Dynamic weighted majority: An ensemble method for drifting concepts,” The Journal of Machine Learning Research, vol. 8, pp. 2755–2790, 2007. [Online]. Available: http://jmlr.org/papers/v8/kolter07a.html
  41. M. P. S. Bhatia, “A two ensemble system to handle concept drifting data streams: recurring dynamic weighted majority,” International Journal of Machine Learning and Cybernetics, vol. 10, 03 2019. http://dx.doi.org/10.1007/s13042-017-0738-9
  42. A. Liu, J. Lu, and G. Zhang, “Diverse instance-weighting ensemble based on region drift disagreement for concept drift adaptation,” IEEE transactions on neural networks and learning systems, vol. 32, no. 1, pp. 293–307, 2020. http://dx.doi.org/10.1109/TNNLS.2020.2978523
  43. B. Celik and J. Vanschoren, “Adaptation strategies for automated machine learning on evolving data,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 9, p. 3067–3078, Sep. 2021. http://dx.doi.org/10.1109/tpami.2021.3062900
  44. T. Chen, X. Chen, W. Chen, H. Heaton, J. Liu, Z. Wang, and W. Yin, “Learning to optimize: A primer and a benchmark,” Journal of Machine Learning Research, vol. 23, no. 189, pp. 1–59, 2022. [Online]. Available: http://jmlr.org/papers/v23/21-0308.html
  45. S. Wang, J. Sun, and Z. Xu, “Hyperadam: A learnable task-adaptive adam for network training,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5297–5304, 07 2019. http://dx.doi.org/10.1609/aaai.v33i01.33015297
  46. S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv preprint https://arxiv.org/abs/1609.04747, 2016. http://dx.doi.org/10.48550/arXiv.1609.04747
  47. T. Tieleman and G. Hinton, “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural Networks for Machine Learning, vol. 4, pp. 26–31, 2012. [Online]. Available: https://cir.nii.ac.jp/crid/1370017282431050757
  48. O. Wichrowska, N. Maheswaranathan, M. W. Hoffman, S. G. Colmenarejo, M. Denil, N. de Freitas, and J. Sohl-Dickstein, “Learned optimizers that scale and generalize,” in Proceedings of the 34th International Conference on Machine Learning - Volume 70, ser. ICML’17. JMLR.org, 2017. http://dx.doi.org/10.5555/3305890.3306069 p. 3751–3760.
  49. K. Lv, S. Jiang, and J. Li, “Learning gradient descent: better generalization and longer horizons,” in Proceedings of the 34th International Conference on Machine Learning - Volume 70, ser. ICML’17. JMLR.org, 2017. http://dx.doi.org/10.5555/3305890.3305913 p. 2247–2255.
  50. T. Chen, W. Zhang, Z. Jingyang, S. Chang, S. Liu, L. Amini, and Z. Wang, “Training stronger baselines for learning to optimize,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020. http://dx.doi.org/10.5555/3495724.3496339 pp. 7332–7343.
  51. T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’16. New York, NY, USA: ACM, 2016. http://dx.doi.org/10.1145/2939672.2939785. ISBN 978-1-4503- 4232-2 pp. 785–794.
  52. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Good-fellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, http://dx.doi.org/10.48550/arXiv.1603.04467, Software available from tensorflow.org.
  53. D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” International Conference on Learning Representations, 12 2014. http://dx.doi.org/10.48550/arXiv.1412.6980