I've been thinking more about my previous answer, and now I'm not so sanguine.
A problem arises because electricity consumption varies by hour depending on both external environmental conditions (especially, temperature), and also on the social conventions that determine work patterns. When daylight savings time begins or ends, the alignment between these two shifts abruptly: the "hour during which the sun sets" may shift from falling during the work day, to falling during evening/dinner-time.
Hence the challenge involves not just how to edit values immediately at the point of change-over. The question is whether DST and standard time should be considered as, in some sense, distinct regimes.
The care with which you address the issue depends, of course, on what you are going to use the forecast for. For many purposes, it might be OK to just ignore the subtleties, and proceed as per your first proposal. My suggestion remains to try that first, and see if the accuracy of your model is good enough to meet the needs of your specific application.
If results are unsatisfactory, a second stage of complexity might involve breaking your project in half, and creating separate models for the winter regime and the summer regime. This approach has a lot to recommend it, actually: the relationship between temperature and power consumption is roughly U-shaped, hitting a minimum at about 18 degrees C, reflecting differences in the way temperature changes affect demand for heating versus cooling. Hence whatever model you come up with will end up acting something like the union of two separate regime-specific models anyway.
A variation on the above -- almost a re-phrasing -- would be to include in your regression equation a DST dummy variable. That sounds sensible.
Again, the big question is: how much time and effort does it make sense to devote to exploring this issue and it's implications for forecast quality? If you are doing applied work (as I gather you are), the goal is to craft a model that is fit-to-purpose, rather than devote your life to finding the best of all possible models.
If you really want to explore this issue, you might look up this paper:
Ryan Kellogg, Hendrik Wolff, Daylight time and energy: Evidence from
an Australian experiment, Journal of Environmental Economics and
Management, Volume 56, Issue 3, November 2008, Pages 207-220, ISSN
0095-0696, 10.1016/j.jeem.2008.02.003.
Keywords: Energy; Daylight saving time;
Difference-in-difference-in-difference
The authors take advantage of the fact that two Australian states at the same latitude have different rules concerning implementing daylight savings time. This difference creates conditions for a natural experiment regarding the effect of DST on energy consumption, with one state acting as the "treatment group" and its neighbor acting as the "control group". Additional background is available from Hendrik Wolff's website. It's interesting work -- though perhaps overkill for your application.
The choice of window length involves a balance between two opposing factors. A shorter window implies a smaller data set on which to perform your estimations. A longer window implies an increase in the chance that the data-generating process has changed over the time period covered by the window, so that the oldest data are no longer representative of the system's current behavior.
Suppose, for example, that you wished to estimate January mean temperature in New York. Due to climate change, data from 40 years ago are no longer representative of current conditions. However, if one uses only data from the past 5 years, your estimate will have a large uncertainty due to natural sampling variability.
Analogously, if you were trying to model the behavior of the Dow Jones Industrial Average, you could pull in data going back over a century. But you may have legitimate reasons to believe that data from the 1920s will not be representative of the process that generates the DJIA values today.
To put it in other terms, shorter windows increase your parameter risk while longer windows increase your model risk. A short data sample increases the chance that your parameter estimates are way off, conditional on your model specification. A longer data sample increases the chance that you are trying to stretch your model to cover more cases than it can accurately represent. A more "local" model may do a better job.
Your selection of window size depends, therefore, on your specific application -- including the potential costs for different kinds of error. If you were certain that the underlying data-generating process was stable, then the more data you have, the better. If not, then maybe not.
I'm afraid I can't offer more insight on how to strike this balance appropriately, without knowing more about the specifics of your application. Perhaps others can offer pointers to particular statistical tests.
What most people do in practice (not necessarily the best practice) is to eyeball it, choosing the longest window for which one can be "reasonably comfortable" that the underlying data-generating process has, during that period, not changed "much". These judgements are based on the analyst's heuristic understanding of the data-generating process.
Best Answer
No matter what is model, generally the more data you have the better. If you want to make a forecast you want your sample to be representative enough for the population as it changes. In most cases you have only partial knowledge about the changes in the past and no knowledge about the future. Gathering more data helps you to gain more confidence in predictability of the changes (this is related to forecastability). You want to find a repeating pattern, trend, or at least describe the random behavior of your process with some model, so you need to be confident that what you observed is somehow similar to what can possibly happen in the future.
It is possible to make time-series forecasts even with short time-series (see also Rob Hyndman's blog), but generally more data means more information and more representative sample. Think of your sample in terms of time-units observations. If you have two years of weekly data, this means that you have only $2\times 52 = 104$ weekly observations. If you want to make forecast half year ahead, than you should consider the fact that you have only four half-years in your data.
Imagine that you work with weather data and want to make half year ahead forecast about temperature. There is noticeable seasonality in temperature, e.g. in central England temperature seems to rise in the first half of the year and drop in the second half (Parker et al, 1992). If you had two years of data temperature fluctuations and wanted to make half-year ahead forecast about temperature between January and June, than only the two of four year-halves of your data would be relevant because of the seasonality (data about drop of the temperature in the second half of the year does not provide you much information about rise in the first half).
(source http://www.metoffice.gov.uk/hadobs/hadcet/cetml1659on.dat)
If there are cycles or seasonality (like with temperature data) that you may assume that will repeat in the future or trend that will continue in the future, than the data that "catches" this pattern may be enough. However the pattern can change, consider for example copper dataset from R fma library. Looking only at the data until the year 1920 would lead you to totally different conclusions than looking at the data after this year (even the average price differs).
In case of multivariate data you are looking at the changes of multiple variables across the time, so you should consider if you have enough information about each of the variables. As an example let ma use bank data from fma library that describes deposits in a mutual savings bank in a large metropolitan area with three variables available: end of month balance (EOM), composite AAA bond rates (AAA), and US Government 3-4 year bonds (threefour). As you can see from the plot posted below, both the individual variables and their mutual relations change over time.
Before building your model you should consider if you have enough information about changes over time in your variables and about of their mutual relations. Answer on the question if have enough data for your forecast horizon unfortunately highly depends on what is your data (see Optimal forecast window for timeseries). You should also remember that sometimes short part of time-series may suggest some pattern (e.g. clear upward trend of AAA in bank data before time 30) that is not so obvious or nonexistent in longer term. Gathering more data, in most cases, helps you to build greater confidence about behavior of the pattern you observe over time.
Parker, D.E., Legg, T.P., and Folland, C.K. (1992). A new daily central England temperature series, 1772–1991. International Journal of Climatology, 12(4), 317-342.