Comparative Study of Deep Learning Models for Predicting Stock Prices

—The stock market is volatile, dynamic, and nonlinear. Hence, predicting the stock prices has been a challenging task for any researcher in time series forecasting. Accurately predicting stock prices has been a hot topic for both ﬁnancial and technical researchers. In this paper, we deploy six deep learning models (i.e., MLP, CNN, RNN, LSTM, GRU, and AE) to predict the closing price, one day ahead, of 20 different companies (i.e. 5 groups of 4) in the S&P 500 markets over the 7-years range (Jan 2015 - August 2022). The experimental results do not provide interesting insights, but also help us to deepen our understanding of how to use deep learning models in ﬁnancial markets.


I. INTRODUCTION
The stock market plays an important role in the global economy today. Accurately predicting stock prices can avoid investment risk and lead to a hugely profitable investment. Recently, it has been a hot topic for both financial and technical researchers [1]- [3]. In fact, the stock market is volatile, dynamic, and nonlinear. Hence, predicting the stock prices has been a challenging task for any researcher in time series forecasting due to various factors, such as the global economy, political conditions, the company's financial performance, etc.
There are two approaches to predicting the stock markets. The first approach is qualitative (or fundamental analysis) [1], in which the intrinsic values are examined, such as market situation, financial factors, management effectiveness, consumer behaviors, and information from social media, economic analyst, etc. 1 The second approach is the technical analysis, in which the historical stock market activities are examined, for example, the price of opening, closing, maximum, minimum, adjusted closing prices and volume of a day.
Unlike the fundamental analysis that is useful for long-term investment, the technical analysis is easily influenced by shortterm news. The second approach consists of two main methods, the traditional method and the machine learning method. To predict stock prices, the former widely uses classical techniques for time series models, like AR(Autoregressive), MA(Moving Average), ARMA(Autoregressive Moving Average), and ARIMA (Autoregressive Integrated Moving Average). In the latter, i.e. the machine learning method, while classical machine learning-based models have been extensively studied and obtained significant results due to their effectiveness despite working with a limited amount of data, deep learning-based models, nevertheless requiring a huge amount of data is gaining numerous interests because of its strength of learning complex patterns in unstructured data.
In this paper, we aim at taking advantage of deep learning models to predict stock prices. Our work has three contributions: • We carry out experiments to evaluate 6 deep learning models.
• We conduct experiments on various stock prices (i.e. 5 five sectors of 4 stock prices) of S&P 500 databases. • We contribute some insights that might help us to deepen our understanding on how to use deep learning models. The paper is organized as follows. Section II provides the related works. Section III presents 6 deep learning models. Data and the measurements are presented in Section IV. Then, the experimental results are provided in Section V. Finally, we conclude in Section VI.

II. RELATED WORKS
The strength of deep learning models is able to find hidden features (or patterns) through a self-learning mechanism. The challenge of these models is to require a massive amount of data. However, we can collect lots of data on the stock markets easily. In this part, we will provide some works which are related to exploiting deep learning models for predicting the stock market.
To predict stock behavior, by collecting Google domestic trends as indicators. Xiong et al. [4] compare a traditional model for time series data, namely the GARCH (Generalized Autoregressive Conditional Heteroskedasticity) [5], and a state-of-the-art model for dealing with long-term dependencies, namely LSTM (Long Short-Term Memory) [6]. The results show that LSTM is superior to GARCH. Fischer and Krauss [7] deploy LSTM to the stocks of the S&P 500 from 1992 until 2015. Interestingly, LSTM performs better than a random forest, a standard deep neural net, and a logistic regression. Yu and Yan [8] combine LSTM with the time series phase-space reconstruction (PSR) method to predict stock prices. The experiments performed on six stock indices for various markets (S&P 500, DJIA, N 225, HSI, CSI 300, and ChiNext) demonstrate that the proposed model outperforms ARIMA, SVR (Support Vector Regressor), MLP (Multilayer Perception), and LSTM without PSR. Furthermore, Karmiani et al. [9] compare several predictive algorithms for the stock market. The results conducted on nine technology companies show that LSTM is a better choice compared to SVM, Backpropagation, and Kalman filter algorithms. To predict the index price of the stock market on the next day, Gao et al. [10] evaluate MLP, LSTM, CNN (Convolutional Neural Network), and attention-based neural network. To do so, S&P 500 index (the most developed market), CSI 300 index (the less developed market), and Nikkei 225 index (the developing market) are considered. The authors show that the attentionbased model is insignificant better than the three other models.
It is reasonable to understand that LSTM has gained significant popularity in stock prediction because LSTM is a stateof-the-art model to deal with sequential data. Surprisingly, to the best of our knowledge, there are a few attempts to employ deep learning models, including LSTM, in financial markets. Hence, it inspires us to experimentally conduct the comparison between deep learning models in terms of predicting stock prices.

III. MODELS
Here we present 6 typical deep learning models which are used to conduct the experiment results in Section V.

A. Multilayer Perceptron (MLP)
An artificial Neural Network (ANN) is a computational model which imitates the way of information processing by neurons in the human brain by making the right connections among nodes. An ANN consists of three parts-a layer of input nodes, layers of hidden nodes, and finally a layer of output nodes (see Figure A1). Each layer consists of a group of multiple neurons/nodes which are connected to others via weighted links. A multilayer perceptron (MLP) is a fully connected network of ANN. The input layer processes the input data and passes it to the hidden layer, then the hidden layer handles the outcome from the previous layer and passes it to the next layer, finally, the output layer produces the result. The learning capability of MLPs takes place by way of adjusting weight values. Thanks to their well-organized structures, MLPs are able to successfully map nonlinear input to output by automatically extracting subtle patterns and multiple features from a large dataset through each layer (see Figure 1). Fig. 1. An illustration of a simple MLP which consists of two consecutive hidden layers located between the input layer and the output layer. every neuron is interconnected and assigned weights (represented by arrows). Each neuron learns/adjusts its weights through its inputs and desired outputs.

B. Convolutional Neural Network (CNN)
The breakthrough in MLP happened when one variant of the models-Deep Convolutional Neural Networks-was ranked first in the ImageNet Large Scale Visual Recognition Challenge after approaching human performance in image classification [11]. Therefrom, Convolutional Neural Networks (CNN) and their descendants have been approaching superhuman performance in a wide range of domains, including pattern recognition, natural language processing, video processing, speech recognition, and time-series forecasting [12].

C. Recurrent Neural Network (RNN)
Notwithstanding the extreme success, CNNs face a big issue: they are not able to cope with time series and sequential data. In order to deal with this issue, Recurrent Neural Network [13]-one of the deep neural network models-was introduced. Due to the flexibility in architecture, computational power, and the rich inherent memory through feedback, RNNs have a wide range of applications in sequential data, including machine translation [14], speech recognition [15], time series anomaly detection [16], and time series forecasting [17].

D. Long Short-Term Memory (LSTM)
Nevertheless, RNNs have two serious problems: vanishing/exploding gradients and learning with long term dependencies [6]. First, like any CNNs, RNNs are able to learn by adjusting weight values. Technically, weight values are updated through the backpropagation algorithm. Unfortunately, in the case of RNNs with a large number of hidden layers, performing the backpropagation algorithm leads to vanishing gradient (i.e, exponential decrease) and exploding gradient (i.e., exponential growth) problems because a large number of derivatives have to be multiplied. Second, RNNs are only able to capture short-term dependencies in sequences. Therefore, we need a new type of architecture design to deal with the two above problems affecting RNNs. Fortunately, Long Short-Term Memory (LSTM) was introduced by Hochreiter and Schmidhuber [6]. LSTM is a type of RNN and is specifically designed to deal with longer dependencies in sequences [18] and reduce the exploding gradients. Unlike RNNs, instead of adding regular neural units (i.e., hidden layers), LSTM adds memory blocks. A common LSTM memory block consists of a cell state and three gates-an input gate, a forget gate, and an output gate.

E. Gated Recurrent Unit (GRU)
To deal with the vanishing gradient and exploding gradient problems, Cho et al. [19] introduced the gated recurrent unit (GRU) which is a variation of LSTM in terms of the architecture designed without the output gate. While LSTM consists of three gates (i.e., input, forget, and output gates), GRU comprises two gates: the reset gate and the update gate. Therefore, GRU has fewer parameters. The advantage of GRN over LSTM is better computation, although GRN obtains comparable results in many cases compared to LSTM [20].

F. Autoencoder (AE)
Autoencoder (AE) is a special artificial neural network [21]. AE consists of three parts: 1) encoder that converts the input into the bottleneck; 2) bottleneck that is a compressed representation keeping only the most important information; and 3) decoder that reconstructs the original input from the bottleneck. In fact, AE is able to handle data in which the features are correlations. Hence, dealing with noisy data is an interesting advantage of Autoencoder.

A. Data
In order to evaluate six deep learning models, we perform them on datasets with different characteristics. Particularly, there are five groups of stocks corresponding to five sectors where companies are working. They are Consumer Cyclical, Communication Services, Energy, Healthcare, and Technology. For each group, we select four stocks of influential companies in their sector, list of 20 stocks is shown in Table I.
The historical data of each stock is collect from January 2 nd 2015 to August 8 th 2022. There are six columns in one dataset: Date, Open, High, Low, Close, Volume. Figure 2 presents data of CVX stock from group Energy, Chervon Corporation engages in integrated energy and chemicals operations worldwide.

B. Evaluation
The performances of six models were evaluated using five statistical indices, i.e., Mean Absolute Error (MAE), the Root Mean Squared Error (RMSE), Mean Absolute Percentage Error (MAPE), Normalized Root Mean Squared Error (NRMSE) and Coefficient of Determination (R2), which can be expressed as follows: whereŷ i is the i th predicted daily closing price, y i is the i th objected daily closing price,ȳ denote the mean observed daily closing price and n is the total number of data samples evaluated.

V. EXPERIMENTAL RESULTS
In this paper, we use Close price of the next day as target variable, while five features including Open, High, Low, Close, Volume are chosen as input. In forecasting stock problems, finding look-back window's size (lags) is an important part. However there isn't a right thesis about choosing the best lags of time for a forecasting problem. Therefore we run experiment to find out the best input size for forecasting stock problems. We run models in five cases with different input sizes consists of 3 days, 1 week, 2 weeks, 3 weeks and 1 month input. Overall, we run experiment for six deep learning models on twenty stocks, the result in one stock is summarized by five cases with different input sizes. The results of experiment allow us to compare performance of models and also select best input to predict stock prices. On a different dataset and a different input size, the results of models are changed. The forecast results of six models for UNH stock by using 1 week of input are shown in Table III.  (1) and (2) we can know the reason of these differences. Two equations (1), (2) use the absolute error between the observed and forecasted values, while price of stocks are different, the results on two stocks are also different using MAE and RMSE indices. In order to summarize results from multiple stocks, we use three indices: MAPE, NRMSE and R 2 . These three indices calculate relative error between real and predicted values, so we can compare the results of models between stocks and discover in a bigger picture. Figure 3 provides the boxplot of NRMSE results for six models in all cases of experiment. From the boxplot, we can compare the performance of models in forecasting stock problems. Through the median NRMSE of the RNN model is not significantly lower than the CNN model, the width of the box and the values of the right boundaries are significantly smaller in the RNN model than in the other five models, indicating that the RNN model provides an overall lower forecast error.
The R 2 or coefficient of determination, provides an indication of goodness of fit and therefore a measure of how well unseen samples are likely to be predicted by the models. The larger R 2 score, the better result and the best result appears when R 2 score equal to 1. In Figure 4, we don't compare the performance of models together but summarize the results from models and group them by each sizes of input. As we can see, changing input size from 3 days to 5 days doesn't improve the results too much. When the input size equal to 10 days e.g. 2 week, the R 2 score is 0.729 and is the largest score in 5 cases of input sizes. Continue increasing input size to 15 and 20 days make the forecast results become worse because the R 2 scores decrease.
From Figure 3 and 4, we can answer the question: which are the best model and the best input size for the predicting stock price problems? Figure 3 shows that the RNN model is the best model overall and Figure 4 tells us that 10 days is the best size of input for these problems. Now we compare models in a smaller case -group of stocks and find the best model for each group. Figure 5 presents the performance of models using MAPE indices. There are 5 groups: Communication Services, Consumer Cyclical, Energy, Healthcare and Technology, for each group we show results of six models. In Energy group, the performance of six models are all good and GRU model has smallest forecast errors. In the 4 other groups, the results of the RNN model and the CNN model outperform the results of the 4 other models. Particularly, the RNN model has best result and the second is the CNN model in Communication Services and Technology group. In Healthcare and Consumer Cyclical group, the RNN model is the second and the CNN model has the best performance.

VI. CONCLUSIONS
Through the paper, we have observed that the RNN model performs the most accurate results in terms of predicting stock prices. However, the choice of models depends on the sector we want to predict. If we want to predict the sector of Communication Services and Technology, the RNN model is the best choice. In the sector of Consumer Cyclical and Healthcare, the best model is the CNN model. And we should use the GRU model for predicting stocks in the Energy sector. Unlike many previous studies that demonstrated that the LSTM model is a state-of-the-art model to deal with financial time series data, our results show that the RNN model is the most suitable model for predicting stock prices because of the short temporal dependency of data.
In the future, we are going to exploit several more deep learning models (e.g., Variational Autoencoder -VAE, Generative Adversarial Network -GAN) on further companies.