Logo PTI Logo FedCSIS

Proceedings of the 20th Conference on Computer Science and Intelligence Systems (FedCSIS)

Annals of Computer Science and Information Systems, Volume 43

A Comparative Study of LSTM Efficiency vs. Transformer Power for Localized Time Series Forecasting

,

DOI: http://dx.doi.org/10.15439/2025F7041

Citation: Proceedings of the 20th Conference on Computer Science and Intelligence Systems (FedCSIS), M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 43, pages 265276 ()

Full text

Abstract. Forecasting multivariate time series increasingly uses deep learning, including models inspired by Neural Machine Translation (NMT). While Transformers excel at long-range dependencies, their computational overhead may not suit all problems. This study advocates the relevancy of Long Short-Term Memory (LSTM)-based encoder-decoder networks for scenarios with shorter input windows and prediction horizons. We present a comprehensive empirical evaluation of four classical LSTM-based NMT models, including variants with attention mechanisms specifically adapted for multistep time series forecasting. Our assessment focuses on their performance and the impact of varying input window sizes and prediction horizons within these computationally efficient, short-sequence contexts. We empirically compare these LSTM-based models against Transformer baselines operating under the same short input window and prediction horizon constraints. Key findings indicate that: (i) LSTM-based NMT models achieve or exceed existing state-of-the-art results for short-term predictions; (ii) within smaller input configurations, input window size minimally affects forecasting performance for tested horizons, suggesting efficiency gains are possible; (iii) for attention-based NMT models, attention scoring critically influences accuracy, demanding careful selection; (iv) our comparative analysis demonstrates that for time series problems where immediate historical context is sufficient, LSTM-based encoder-decoders are competitive with, or even outperform, Transformer-based models while offering a more computationally efficient solution. Overall, our findings signify that original LSTM-based NMT models are robust and capable tools, particularly well-suited for short-term time series prediction tasks where local pattern capture and computational efficiency are priorities, even in the era of Transformers.

References

  1. Y. C. Fung and B. Amonov, “Decoding financial data: Machine learning approach to predict trading actions,” in Proceedings of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS), ser. Annals of Computer Science and Information Systems, M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, and D. Ślęzak, Eds., vol. 39. IEEE, 2024, p. 739–744. [Online]. Available: http://dx.doi.org/10.15439/2024F4556
  2. C. Lin, “Key financial indicators analysis and stock trend forecasting based on a wrapper feature selection method,” in Proceedings of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS), ser. Annals of Computer Science and Information Systems, M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, and D. Ślęzak, Eds., vol. 39. IEEE, 2024, p. 755–759. [Online]. Available: http://dx.doi.org/10.15439/2024F3560
  3. V.-T. Duong, D.-T.-A. Nguyen, T.-T.-H. Pham, V.-H. Nguyen, and V.-Q. A. Le, “Comparative study of deep learning models for predicting stock prices,” in Proceedings of the Seventh International Conference on Research in Intelligent and Computing in Engineering, ser. Annals of Computer Science and Information Systems, V. K. Solanki and B. T. Thanh, Eds., vol. 33. PTI, 2022, p. 103–108. [Online]. Available: http://dx.doi.org/10.15439/2022R02
  4. J. Jenko, J. P. Costa, D. Vladušič, U. Bavčar, and R. Šabarkapa, “Learning from the COVID-19 pandemic to improve critical infrastructure resilience using temporal fusion transformers,” in Proceedings of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS), 2024, pp. 375–384.
  5. S. Yang, M. Tan, S. Xia, and F. Liu, “A method of intrusion detection based on attention-lstm neural network,” in Proceedings of the 2020 5th International Conference on Machine Learning Technologies, ser. ICMLT 2020. New York, NY, USA: Association for Computing Machinery, 2020, p. 46–50. [Online]. Available: https://doi.org/10.1145/3409073.3409096
  6. S. Ahmad, A. Lavin, S. Purdy, and Z. Agha, “Unsupervised real-time anomaly detection for streaming data,” Neurocomputing, vol. 262, pp. 134–147, 2017, online Real-Time Learning Strategies for Data Streams. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231217309864
  7. P. Malhotra, A. Ramakrishnan, G. Anand, L. Vig, P. Agarwal, and G. Shroff, “LSTM-based encoder-decoder for multi-sensor anomaly detection,” Computing Research Repository (CoRR), vol. abs/1607.00148, 2016. [Online]. Available: http://arxiv.org/abs/1607.00148
  8. G. Lai, W.-C. Chang, Y. Yang, and H. Liu, “Modeling long- and short-term temporal patterns with deep neural networks,” in The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, ser. SIGIR ’18. New York, NY, USA: Association for Computing Machinery, 2018, p. 95–104. [Online]. Available: https://doi.org/10.1145/3209978.3210006
  9. J. Yin, W. Rao, M. Yuan, J. Zeng, K. Zhao, C. Zhang, J. Li, and Q. Zhao, “Experimental study of multivariate time series forecasting models,” in Proceedings of the 28th ACM International Conference on Information and Knowledge Management, ser. CIKM ’19. New York, NY, USA: Association for Computing Machinery, 2019, p. 2833–2839. [Online]. Available: https://doi.org/10.1145/3357384.3357826
  10. I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger, Eds., vol. 27. Curran Associates, Inc., 2014. [Online]. Available: https://proceedings.neurips.cc/paper/2014/file/a14ac55a4f27472c5d894ec1c3c743d2-Paper.pdf
  11. S. Afreen, N. T. D. Linh, S. Kodur, and A. Begum, “Seq2Seq transformer-based model for optimized Chinese-to-English translation,” in Proceedings of the Ninth International Conference on Research in Intelligent Computing in Engineering, ser. Annals of Computer Science and Information Systems, V. K. Solanki, T. D. Tan, P. Kumar, and M. Cardona, Eds., vol. 42. PTI, 2024, p. 1–10. [Online]. Available: http://dx.doi.org/10.15439/2024R81
  12. S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 11 1997. [Online]. Available: https://doi.org/10.1162/neco.1997.9.8.1735
  13. T. Markovic, A. Dehlaghi-Ghadim, M. Leon, A. Balador, and S. Punnekkat, “Time-series anomaly detection and classification with long short-term memory network on industrial manufacturing systems,” in Proceedings of the 18th Conference on Computer Science and Intelligence Systems, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, and D. Ślęzak, Eds., vol. 35. IEEE, 2023, p. 171–181. [Online]. Available: http://dx.doi.org/10.15439/2023F5263
  14. P. Lam, L. Pham, T. Nguyen, H. Tang, M. Seidl, M. Andresel, and A. Schindler, “LSTM-based deep neural network with a focus on sentence representation for sequential sentence classification in medical scientific abstracts,” in Proceedings of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS), ser. Annals of Computer Science and Information Systems, M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, and D. Śl˛ezak, Eds., vol. 39. IEEE, 2024, p. 219–224. [Online]. Available: http://dx.doi.org/10.15439/2024F5872
  15. A. Vaswani, N. M. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Neural Information Processing Systems, 2017.
  16. T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal: Association for Computational Linguistics, Sep. 2015, pp. 1412–1421. [Online]. Available: https://www.aclweb.org/anthology/D15-1166
  17. D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” in Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, Jan. 2015, 3rd International Conference on Learning Representations, ICLR 2015 ; Conference date: 07-05-2015 Through 09-05-2015.
  18. K. Cho, B. van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder–decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 1724–1734. [Online]. Available: https://www.aclweb.org/anthology/D14-1179
  19. H. Wu, J. Xu, J. Wang, and M. Long, “Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,” in Neural Information Processing Systems, 2021.
  20. T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin, “FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting,” in Proceedings of the 39th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research. PMLR, 2022.
  21. X. Zhang, X. Jin, K. Gopalswamy, G. Gupta, Y. Park, X. Shi, H. Wang, D. C. Maddix, and Y. Wang, “First de-trend then attend: Rethinking attention for time-series forecasting,” arXiv preprint https://arxiv.org/abs/2212.08151, 2022.
  22. G. E. P. Box and D. A. Pierce, “Distribution of residual autocorrelations in autoregressive-integrated moving average time series models,” Journal of the American Statistical Association, vol. 65, no. 332, pp. 1509–1526, 1970. [Online]. Available: https://www.tandfonline.com/doi/abs/10.1080/01621459.1970.10481180
  23. G. E. P. Box and G. M. Jenkins, Time Series Analysis: Forecasting and Control, 3rd ed. USA: Prentice Hall PTR, 1994.
  24. H.-F. Yu, N. Rao, and I. S. Dhillon, “Temporal regularized matrix factorization for high-dimensional time series prediction,” in Proceedings of the 30th International Conference on Neural Information Processing Systems, ser. NIPS’16. Red Hook, NY, USA: Curran Associates Inc., 2016, p. 847–855.
  25. P. R. Winters, Forecasting Sales by Exponentially Weighted Moving Averages. Berlin, Heidelberg: Springer Berlin Heidelberg, 1976, pp. 384–386. [Online]. Available: https://doi.org/10.1007/978-3-642-51565-1_116
  26. C. C. Holt, “Forecasting seasonals and trends by exponentially weighted moving averages,” International Journal of Forecasting, vol. 20, no. 1, pp. 5–10, 2004. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0169207003001134
  27. I. Melnyk and A. Banerjee, “Estimating structured vector autoregressive models,” in Proceedings of The 33rd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. F. Balcan and K. Q. Weinberger, Eds., vol. 48. New York, New York, USA: PMLR, 20–22 Jun 2016, pp. 830–839. [Online]. Available: http://proceedings.mlr.press/v48/melnyk16.html
  28. H. Qiu, S. Xu, F. Han, H. Liu, and B. Caffo, “Robust estimation of transition matrices in high dimensional heavy-tailed vector autoregressive processes,” in Proceedings of the 32nd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, F. Bach and D. Blei, Eds., vol. 37. Lille, France: PMLR, 07–09 Jul 2015, pp. 1843–1851. [Online]. Available: http://proceedings.mlr.press/v37/qiu15.html
  29. R. Frigola, F. Lindsten, T. B. Schön, and C. E. Rasmussen, “Bayesian inference and learning in Gaussian process state-space models with particle MCMC,” https://arxiv.org/abs/1306.2861 [stat.ML], 2013.
  30. J. L. Elman, “Finding structure in time,” Cognitive Science, vol. 14, no. 2, pp. 179–211, 1990. [Online]. Available: https://www.sciencedirect.com/science/article/pii/036402139090002E
  31. J. Connor, L. Atlas, and D. Martin, “Recurrent networks and narma modeling,” in Advances in Neural Information Processing Systems, J. Moody, S. Hanson, and R. P. Lippmann, Eds., vol. 4. Morgan-Kaufmann, 1992. [Online]. Available: https://proceedings.neurips.cc/paper/1991/file/5ef0b4eba35ab2d6180b0bca7e46b6f9-Paper.pdf
  32. T. Mikolov, A. Joulin, S. Chopra, M. Mathieu, and M. Ranzato, “Learning longer memory in recurrent neural networks,” in Workshop at the International Conference on Learning Representation; ICLR 2015, 12 2015.
  33. Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” Trans. Neur. Netw., vol. 5, no. 2, p. 157–166, Mar. 1994. [Online]. Available: https://doi.org/10.1109/72.279181
  34. S. Hochreiter, “The vanishing gradient problem during learning recurrent neural nets and problem solutions,” Int. J. Uncertain. Fuzziness Knowl.-Based Syst., vol. 6, no. 2, p. 107–116, Apr. 1998. [Online]. Available: https://doi.org/10.1142/S0218488598000094
  35. N. Srivastava, E. Mansimov, and R. Salakhutdinov, “Unsupervised learning of video representations using lstms,” in Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ser. ICML’15. JMLR.org, 2015, p. 843–852.
  36. S. Ma, L. Sigal, and S. Sclaroff, “Learning activity progression in lstms for activity detection and early detection,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1942–1950.
  37. I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, ser. NIPS’14. Cambridge, MA, USA: MIT Press, 2014, p. 3104–3112.
  38. K. Cho, B. van Merriënboer, D. Bahdanau, and Y. Bengio, “On the properties of neural machine translation: Encoder–decoder approaches,” in Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation. Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 103–111. [Online]. Available: https://www.aclweb.org/anthology/W14-4012
  39. O. Hamel and M. Fareh, “Encoder-decoder neural network with attention mechanism for types detection in linked data,” in Proceedings of the 17th Conference on Computer Science and Intelligence Systems, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, and D. Ślęzak, Eds., vol. 30. IEEE, 2022, p. 733–739. [Online]. Available: http://dx.doi.org/10.15439/2022F209
  40. Y. Qin, D. Song, H. Cheng, W. Cheng, G. Jiang, and G. W. Cottrell, “A dual-stage attention-based recurrent neural network for time series prediction,” in Proceedings of the 26th International Joint Conference on Artificial Intelligence, ser. IJCAI’17. AAAI Press, 2017, p. 2627–2633.
  41. S. Huang, D. Wang, X. Wu, and A. Tang, “DSANet: Dual self-attention network for multivariate time series forecasting,” in Proceedings of the 28th ACM International Conference on Information and Knowledge Management, ser. CIKM ’19. New York, NY, USA: Association for Computing Machinery, 2019, p. 2129–2132. [Online]. Available: https://doi.org/10.1145/3357384.3358132
  42. T. Lin, T. Guo, and K. Aberer, “Hybrid neural networks for learning the trend in time series,” in Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, 2017, pp. 2273–2279. [Online]. Available: https://doi.org/10.24963/ijcai.2017/316
  43. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016.
  44. D. Salinas, V. Flunkert, J. Gasthaus, and T. Januschowski, “DeepAR: Probabilistic forecasting with autoregressive recurrent networks,” International Journal of Forecasting, vol. 36, no. 3, pp. 1181–1191, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0169207019301888
  45. Y. G. Cinar, H. Mirisaee, P. Goswami, E. Gaussier, A. Aït-Bachir, and V. Strijov, “Position-based content attention for time series forecasting with sequence-to-sequence rnns,” in Neural Information Processing, D. Liu, S. Xie, Y. Li, D. Zhao, and E.-S. M. El-Alfy, Eds. Cham: Springer International Publishing, 2017, pp. 533–544.
  46. R. Wen, K. Torkkola, B. Narayanaswamy, and D. Madeka, “A multi-horizon quantile recurrent forecaster,” https://arxiv.org/abs/ Machine Learning, 2017.
  47. I. M. Baytas, C. Xiao, X. Zhang, F. Wang, A. K. Jain, and J. Zhou, “Patient subtyping via time-aware LSTM networks,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’17. New York, NY, USA: Association for Computing Machinery, 2017, p. 65–74. [Online]. Available: https://doi.org/10.1145/3097983.3097997
  48. C. Fan, Y. Zhang, Y. Pan, X. Li, C. Zhang, R. Yuan, D. Wu, W. Wang, J. Pei, and H. Huang, “Multi-horizon time series forecasting with temporal attention learning,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ser. KDD ’19. New York, NY, USA: Association for Computing Machinery, 2019, p. 2527–2535. [Online]. Available: https://doi.org/10.1145/3292500.3330662
  49. K. Kaczmarek, J. Pokrywka, and F. Graliński, “Using transformer models for gender attribution in polish,” in Proceedings of the 17th Conference on Computer Science and Intelligence Systems, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, and D. Śl˛ezak, Eds., vol. 30. IEEE, 2022, p. 73–77. [Online]. Available: http://dx.doi.org/10.15439/2022F197
  50. S. Li, X. Jin, Y. Xuan, X. Zhou, W. Chen, Y.-X. Wang, and X. Yan, “Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting,” in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32. Curran Associates, Inc., 2019. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2019/file/6775a0635c302542da2c32aa19d86be0-Paper.pdf
  51. H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” in AAAI Conference on Artificial Intelligence, 2020.
  52. Y. Zhang and J. Yan, “Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting,” in International Conference on Learning Representation, 2023.
  53. F. M. Bianchi, E. Maiorino, M. Kampffmeyer, A. Rizzi, and R. Jenssen, Recurrent Neural Networks for Short-Term Load Forecasting: An Overview and Comparative Analysis. Springer, 01 2017.
  54. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization.” Computing Research Repository (CoRR), vol. abs/1412.6980, 2014. [Online]. Available: http://dblp.uni-trier.de/db/journals/corr/corr1412.html#KingmaB14