The reason is that ARIMA is auto-projective which uses the most recent data to compute essentially a weighted average of past values. When forecasting, the 1 step ahead is used to predict the second step etc. .This then leads to long term forecasts that approach an asymptote. When fitting , the actual history is used to predict the next point. What you should do is to build an integrated model that includes deterministic structure like day-of-the-week, changes in the day-of-the-week coefficients, monthly/weekly effects taking into account any events like holidays and include any needed level shifts or local time trends that can be detected and seamlessly incorporated. Additionally particular days of the month and particular weeks of the month may come into play. This is done by hour and by daily sums. Furthermore use the daily sums history and its forecasts as a possible predictor variable for each of the 24 hourly models. Make sure that you verify that the parameters of each of your 25 models are invariant over time and that the error variance for each of your 25 models doesn't change over time. Finally you may need to incorporate a parent-to-child or child-to-parent strategy to reconcile any differences. We have been very successful using these procedures and you should also be succesful.
On the one hand, you can certainly use ts
objects. Specify the season length through the frequency
parameter:
auto.arima(ts(data[,1],frequency=48))
However, electricity consumption usually has multiple seasonalities. You have the intra-day seasonality, but you will usually also have weekly seasonality, since weekends have different power consumption than weekdays - people are at home instead of at work, industrial production is reduced etc. Plus, there may be yearly seasonality, with air conditioners and heaters consuming a lot of electricity in summer/winter.
Use msts
objects to encode time series with multiple seasonalities:
msts(data[,1],seasonal.periods=c(48,7*48))
You can then fit models with multiple seasonalities using tbats
:
tbats(msts(data[,1],seasonal.periods=c(48,7*48)))
Both are again in the forecast
package. You may want to look at earlier questions on "multiple seasonalities".
Best Answer
I've been thinking more about my previous answer, and now I'm not so sanguine.
A problem arises because electricity consumption varies by hour depending on both external environmental conditions (especially, temperature), and also on the social conventions that determine work patterns. When daylight savings time begins or ends, the alignment between these two shifts abruptly: the "hour during which the sun sets" may shift from falling during the work day, to falling during evening/dinner-time.
Hence the challenge involves not just how to edit values immediately at the point of change-over. The question is whether DST and standard time should be considered as, in some sense, distinct regimes.
The care with which you address the issue depends, of course, on what you are going to use the forecast for. For many purposes, it might be OK to just ignore the subtleties, and proceed as per your first proposal. My suggestion remains to try that first, and see if the accuracy of your model is good enough to meet the needs of your specific application.
If results are unsatisfactory, a second stage of complexity might involve breaking your project in half, and creating separate models for the winter regime and the summer regime. This approach has a lot to recommend it, actually: the relationship between temperature and power consumption is roughly U-shaped, hitting a minimum at about 18 degrees C, reflecting differences in the way temperature changes affect demand for heating versus cooling. Hence whatever model you come up with will end up acting something like the union of two separate regime-specific models anyway.
A variation on the above -- almost a re-phrasing -- would be to include in your regression equation a DST dummy variable. That sounds sensible.
Again, the big question is: how much time and effort does it make sense to devote to exploring this issue and it's implications for forecast quality? If you are doing applied work (as I gather you are), the goal is to craft a model that is fit-to-purpose, rather than devote your life to finding the best of all possible models.
If you really want to explore this issue, you might look up this paper:
The authors take advantage of the fact that two Australian states at the same latitude have different rules concerning implementing daylight savings time. This difference creates conditions for a natural experiment regarding the effect of DST on energy consumption, with one state acting as the "treatment group" and its neighbor acting as the "control group". Additional background is available from Hendrik Wolff's website. It's interesting work -- though perhaps overkill for your application.