An analysis of the effect between the heat index and Long Short-Term Memory model to electricity load forecasting

—Accurate electricity load forecasting is essential for operating electrical systems. Most of the studies on electricity load forecasting are based on electricity load data or weather data, which is air temperature, but there are not consider the heat index. This paper proposes a short-term electricity load forecasting model using Long-Short Term Memory (LSTM) based on electricity load history and heat index data. In addition, the proposed model is applied to the data of IEVN NLDC (National Load Dispatching Center) in forecasting electricity load before 48 hours. This model is used to predict the electricity load of the Vietnamese nation and the power corporations of Vietnam. For a fair comparison, the LSTM network has ﬁxed parameters, then compared the results when using temperature and the heat index. According to experimental results based on the Mean absolute percentage errors (MAPE) assessment, the proposed model has better accuracy than the model based on electricity load history and temperature.


I. INTRODUCTION
In Vietnam, the real-time load of the nation, regions separately, or power corporations are managed by the way that called top-down. This way means generating the capacity of the area's interior and the forwarding capacity of the line linking domains and regions. The electricity load forecasting is mainly based on samples from the past. In addition, affecting factors to load include days in the week, days off, holidays, or weather elements such as temperature, unusual weather phenomena like typhoons. Among all of those factors related to weather, the temperature is being considered the most important clue to determine the fluctuation of load. The other factors as humidity, amount of rain are also noticed. However, if these factors alone are considered and treated as an independent input variable, the efficiency is not high.
In conclusion, analyzing and evaluating the effects of these factors on electricity load is a difficult job and have yet had many solutions and research that can solve this effectively, especially in the specific condition of a tropical nation that has a high proportion of daily life electricity( 30%-40%) like Vietnam.
Long Short-Term Memory Network (LSTM) is an improved version of the Recurrent Neural Network (RNN) model. Because Fog formation is closely related to the meteorological elements, [1] used the LSTM model to predict the short-term fog based on meteorology. The model consists of an LSTM network and a fully connected layer with time-series input data. Although there are still many challenges for the fog forecasting project, the proposed LSTM model gives better results than the current models such as AdaBoost, CNN, KNN. Because of its effectiveness in time dependency learning, [2] presented a model based on LSTM for urban sound classification. [3] Stacked bidirectional and unidirectional LSTM can be used for forecasting network-wide traffic state, while the LSTM and BiLSTM can fill missing values to yield a better prediction. This paper also shows that the 2-layer model is the most effective model and the model with 2 layers. BiLSTM is the best among LSTM and BiLSTM composite models. The proposed model is only good when the rate of missing data is small. When the rate of missing values is large, the performance of the proposed model decreases while the error of BGCP is nearly unchanged. [ regression and estimating confidence interval. Especially the independent variable is the predicted value while the dependent variable is the actual value. In this paper [6], the LSTM model was used to select features and improve classification accuracy. [7] To forecast the confidence interval, the pinball loss function was integrated with the LSTM network. The result will be the quantiles, not the random variable's expectation. That is critical for unsteady data. Indeed, the performance is pretty strong, especially when applied to data from small and medium-sized firms (SMEs) in Ireland. However, the model is unsuccessful with variable data, particularly with resident load data that includes domestic chores and working hours. Furthermore, the model does not employ optimum approaches to choose the LSTM's parameters. In general, LSTM is a popular and effective model. Especially it is important for short-term electrical load forecasting.
An important indicator used in research on the health effects of heat is the "Steadman's heat index" [8] [9]. Steadman's heat index converts current weather conditions to what humans feel like the dew point temperature of 57.2°F/ 14.0°C [8] [10]. As a result, Steadman converted the combination of air humidity and temperature and several other factors [8] [9] into an index with the same unit of measurement as the air temperature. Based on Steadman's table of original apparent temperature [8], "Heat Index"(HI) is an approximate, simple version and is based solely on air temperature and humidity. [11] Compared 21 algorithms for generating HI, NWS online 2011 algorithm was supposed to produce HI that closely resembled the apparent temperature in Steadman's table and did not show much difference from air temperature when the temperature was lower than 20 degrees C. In the paper, [12] NWS HI is used as the heat index to measure people's feelings during the day, from which to study the changes in heat in Da Nang, Vietnam in the period 2020-2049 showing the effect of heat on the health and productivity of workers in Da Nang.
In the paper [13], the authors used the forecasting model that is developed using weighted SVR models with nu-SVR and epsilon SVR. The model showed a good result: the mean absolute percentage error (MAPE) for daily energy consumption data is 5.843% and that for half-hourly energy consumption is 3.767%, respectively. Another models proposed by Yaoyao He et al. [14] solved a problem about short-term power load probability density forecasting. This model uses kernel-based support vector quantile regression (KSVQR) and Copula theory. The simulation results show that the proposed method has great potential for power load forecasting by selecting the appropriate kernel function for the KSVQR model. The best MAPE value (in all tests) is 0.81%, and the best MAE value is 47.20. In the article, [15], the short-term electrical load forecasting is solved by combining the extreme learning machine (ELM) with a new switching delayed PSO algorithm. The proposed model has better performance than other state-of-the-art ELMs. The results proved that the proposed learning algorithm gets better electrical load forecasting results in comparison with the RBFNN algorithm. The MAPE of the proposed model is 2.182%, and the MAPE of the RBFNN method is 2.902%. Therefore, the MAPE of the proposed model achieves better performance. Similarly, the proposed model's MAE also gets better results than the RBFNN. In the paper [16], Song Li et al. propose a new synthetic method for short-term electrical load forecasting based on wavelet transform, extreme learning machine (ELM) and, partial least squares regression (PLSR). The wavelet transform is used to decompose the time series and used ELM for each sub-component from the wavelet decomposition. The results demonstrated that the proposed model could ameliorate load forecasting performance. The MAPE of the proposed model (for 1 hour ahead forecasting) is 0.2736%, which is the smallest in all results of all models used in the paper.
This paper aims to improve the accuracy of electricity load forecasting by the heat index. Concretely, the heat index is used for a feature of deep learning. Then, deep learning long short-term memory (LSTM) is used for short-term electricity load forecasting. The effective heat index for the proposed model gets outperformed.
The structure of the paper is as follows: Section II explains the methodology of feature selection using heat index, LSTM using heat index is proposed. The experiments and results are presented in Section III. Next, Section IV provides the conclusion and discussions. The final section of this paper, section V is future work.

A. Formula
Let T = {0, 1, · · · , n, · · · } be a set of discrete-time. The electricity load time series are denoted by y t , t ∈ T.
The impact factors of electricity load are presented by X t = (X 1 t , · · · , X n t ). In real life, the principle of the impact factor is complicated. To find a precise model to capture all factors and the perfect forecast is investigated, the electricity load y t is the function of impact factors that influence the electricity load where ε t is uncertain part. The load forecasting model f (.) is the target of this problem. Therefore, the purpose of the forecasting model is to find the best f (.) to satisfy the mean absolute percentage errors are minimum.

B. LSTM
In 1997, LSTM was an improvement of the RNN that was introduced by Hochreiterand Schmidhuber [17]. Then, Gers et al.proposed the addition of the forget gate in 2000 [18].
The forget gate f t decides which information to get or delete from the cell state. f t is calculated by: The input gates i t is calculated by: The current cell state C t is created by all the gates. The forget gate f t will control the long-term information, and the input gate i t will control the short-term information: The output gate is responsible for selecting and outputting the necessary information: And the last step, the final output of LSTM h t is calculated by: Where the weight matrix and bias corresponding to forget gate, input gate and output gate.
The operator * stands for the element-wise multiplication.
C. Heat index NWS online [11] creates an algorithm to calculate HI for detecting dangerous heat warnings. HI is calculated as a function of relative temperature and humidity. Figure 2 shows the heat index obtained by the HI NWS online. However, in this paper, we will make some changes in the formula to match the electricity usage forecast of Vietnamese people. The proposed algorithm shown in Figure 3 is described as follows: Under conditions where the temperature is less than C degrees Fahrenheit, the temperature is considered exactly as HI. The parameter C is considered the cold threshold. C is changed because it affects whether people use heating devices, thanks to affecting the amount of electricity used. HI for conditions temperature is between C and 77 • F:  Fig. 2. The heat index is calculated using the formula NWS online [11], depending on two factors of temperature and relative humidity. The dark color shows the less heat index, and the light color show the high heat index.
If the relative humidity is less than 13% and the temperature is greater than 85 degrees F and not less than H degrees F, the above HI will subtract one adjustment below: where H, which is considered the hot threshold, is also changed because it affects the use of electrical equipment for cooling. If the relative humidity is more than 85% and the temperature is between 80 and 87 degrees F, the following adjustment is added to HI: For convenience, the input temperature data, C and H values in degrees Celsius will be converted to degrees Fahrenheit to match the algorithm. The calculated results are returned to degrees Celsius as temperature.

D. Proposed model
To use the heat index data for electricity load prediction with the LSTM model, we have proposed a model as follows ( see Figure 4): • Step 1: Using log transformation for the electricity load dataset.
• Step 2: Heat index algorithm: Use heat index algorithm to create heat index based on temperature data and humidity data.
• Step 3: Dataset is divided into the training set and testing set to carry out the experiments.
• Step 4: Training LSTM model to get the appropriate parameters for forecasting. That is the final model to be used for forecasting the test set.

A. Data sets
Data set description: The data used in this paper are the National and electrical load data set of Power Companies (HNPC, CPC, NPC, SPC, HCMPC), temperature (degrees C), relative humidity(%). The time series start from 1/1/2018 to 31/5/2020 with a resolution of 1 hour.

B. Evaluation criteria
Mean absolute percentage error (MAPE) and root means square error (RMSE) is used to evaluate the result of forecasting. These criteria are investigated as follows equations: RM SE = 1 n n t=1 (y real (t) − y f ore (t)) 2 (13)

C. Scenarios
Three experimental scenarios are investigated corresponding with three different inputs: • Scenario 1: The input for the forecasting model is electricity load data and temperature data.
• Scenario 2: input data is the same as scenario 1 except that the temperature data is replaced by humidity data.
• Scenario 3: Heat index is calculated carefully and is chosen as a part of input together with electricity load data. In this paper, we will use the same LSTM model and select fixed provinces for all scenarios.

D. Results
In this paper, we chose 48 hours for the forecasting horizon because of the national power A0's operation problem. At present, we only have all the day before today's electrical load data and use it to forecast the load for the next day. Then, we make a decision based on that prediction.
In the national load forecasting model, the results obtained by the model with the input variable using heat index are very positive. In the base model using only the temperature input variable, the MAPE value obtained in the prediction national load model is 4.01%, with the corresponding RMSE being approximately 1580MW. When using heat index with many different test models of the hot-cold threshold as a part of input data, the results got an improvement. All experimental models have the error below the threshold of 3.83% that shows a decrease compared with 4.01% in the temperature model. The RMSE also improved by about 400 MW. Going into each model, we can see that the 41-4 model shows the best result with MAPE of 3.53%. The difference between the top 10 best models is not too big, at only a 0.3% difference.
In the predicted results on the PC, the heat index input model also achieves more accurate results than the model using the input variable temperature. Comparing the results of Table  I with the results of Tables IV, V, VI, VII, VIII shows that the  prediction models with the input as heat index with different  C-H thresholds have low MAPE and RMSE than the model  predicted with the input variable temperature. With this result, the heat index application model has a remarkable improvement in quality. Although heat index can be calculated easily from the factors of temperature and humidity, using this index as the main input parameter of the model helps the models get the error relatively low. So that heat index is applicable in the actual operation problems of the electrical system. The difficulty of data collection should also consider into account. The formula for calculating heat index is quite explicit, which is a good choice to make the model more practical because that does not have to add too many specialized meteorological factors that are difficult to collect and forecast. IV. CONCLUSION AND DISCUSSION This paper proposes a new methodology to research heat index and electricity load forecasting. The heat index depends on the temperature and humidity that affect to behaviors of electricity load consumers. The proposed model using heat index algorithm and LSTM for short-term electricity load forecasting gets better performance.  • Well applying the proposed model into data sets of Vietnam; • Comparing with using the temperature feature, the proposed model achieves a better performance in electrical load forecasting.
V. FUTURE WORKS In the future, we need to improve the accuracy. Firstly, the heat index should be dispensed into groups and dependently researched according to the lunar calendar or seasons. Secondly, the heat index should be clustering based on the volume to analyze the electricity load forecasting. Finally, we can combine optimization methods for feature selection and hyperparameter optimization to improve the performance of the model.