My first observation is that you did not lag the inputs relative to the closing price and that is why you observed such good fit. The SMA (simple moving average) uses the closing price in its calculation and the high low range encompasses the closing price, so using them to predict the closing price imparts a look ahead bias. My opinion is that if you are trying to predict the closing price two days ahead you should build your model with inputs that are lagged from the closing price by at least two days. Some of the inputs may be lagged by more than two days, but I would start simple and try and use just a handful of inputs.
As far as your objective to predict closing price, I think that closing prices are too noisy to be used as target variables and using them will lead to overfitting or optimization of the wrong objective. Instead I would start by smoothing the closing price with a moving average and then predicting the direction of price change over the next two days. For example i might replace the close with a 5 day SMA of the close and then code the price change of the SMA as 1 if it was positive over the next two days and 0 otherwise. Because the output variable is now coded as a 1 or a 0 this is a good problem to try and solve with the random forest function you were using. You could also try some other classification algoritms like logistic regression, neural networks, and SVMs and maybe combine a few into an ensemble to improve your performance. This is still a difficult problem to solve without overfitting, but it is a step in the right direction. Another word of caution is that your final model could have amazing accuracy at classifying the next two days as either positive or negative, but still lose money because it classified a few large moves incorrectly.
I would also recommend building your model on more than one security so that the machine learning algorithm does not hone in on the idiosyncrasies of one stock. I would start with at least 5 stocks that are not highly correlated to eachother.
Trading on the Edge by Guido Deboeck is a good place to start for exploring the applications of machine learning to financial time series prediction. It's an older book so it is way behind the technology we have available today but it is a good start. I would also recommend New Trading Systems and Methods by Kaufman and Expert Trading Systems by John Wolberg.
You may want to start by doing some exploratory analysis before you dive into making a prediction model or put this into a prediction modelling framework.
Try to plot the data to see if you can spot whether some trends appear. It is likely that some of the explanatory variables that you have are completely redundant. Depending on the amount of data that you have, this may cause your prediction model to overfit if you do not ignore them.
Energy consumption is most likely dependent on the weather w.r.t. temperature and humidity, (although wind also plays a part). E.g. people turn on their radiators when it is cold and AC when it is warm. Time of day is also important, since when people are not at home during the day, they might not be using as much energy in their homes etc.
Instead of using the time of the day as a variable you can split it into fewer factors, e.g. night, morning, working day, evening. This will help w.r.t. overfitting.
You might also want to introduce factor variables which tell whether a given day is a national holiday or not, i.e. on Christmas or during the super bowl energy consumption will likely spike. It is hardest to model these big spikes/outliers in your data, you need to insert your expert knowledge on the problem into the equation to account for this.
This is not an easy problem and usually the method that you use is not what is most important. What matters most is how you preprocess your data and how you add in your own assumptions about the situations (e.g. the holidays).
The easiest way to go is to use a linear model or a random forest. Random forests are easy to use in most languages and are rather safe for not to overfit.
You can also get something from the random forest which is called variable importance, it shows you how "important" the variables are for making predictions and may help in interpreting the results.
Hope this helps, just don't dump this into some model head first, think about the problem and what matters for these predictions. Also look at the residuals after you have fitted the model.
Best Answer
flow_t-1
as an explanatory variable. To obtain the forecast one period ahead (at $t+1$), you simply fit the model using the data you have at time t. For more than 1 period ahead forecasts, you need slightly more complicated dynamics.