You might want to consider forecastable component analysis (ForeCA), which is a dimension reduction technique for time series, specifically designed to obtaina lower dimensional space that is easier to forecast than the original time series.

Let's look at an example of monthly sunspot numbers and for computational efficiency let's just look at the 20th century.

```
yy <- window(sunspot.month, 1901, 2000)
plot(yy)
```

Sunspot numbers are only a univariate time series $y_t = (y_1, \ldots, y_T)$, but we can turn this into a multivariate time series by embedding it its lagged $(p+1)$-dimensional feature space $X_t = \left(y_t, y_{t-1}, \ldots, y_{t-p}\right)$. This is a common technique in non-linear time series analysis.

```
XX <- embed(yy, 24)
XX <- ts(XX, end = end(yy), freq = 12)
dim(XX)
## [1] 1166 24
```

In R you can use the **ForeCA** package to do the estimation. Note that this requires the multivariate spectrum of a $K$-dimensional time series with $T$ observations, which is stored in a $T \times K \times K$ array (one can use symmetry/Hermitian property to half the size). Hence it takes considerably longer to compute than iid dimension reduction techniques (such as PCA or ICA).

So here we take the 24-dimensional time series of embedded sunspot numbers and try to find a 6-dimensional subspace that has interesting patterns that can be easily forecasted.

```
library(ForeCA)
# this can take several seconds
mod.foreca <- foreca(XX, n.comp = 4,
spectrum.control = list(method = "wosa"))
mod.foreca
## ForeCA found the top 4 ForeCs of 'XX' (24 time series).
## Out of the top 4 ForeCs, 0 are white noise.
##
## Omega(ForeC 1) = 53% vs. maximum Omega(XX) = 43%.
## This is an absolute increase of 9.9 percentage points (relative: 23%) in forecastability.
##
## * * * * * * * * * *
## Use plot(), biplot(), and summary() for more details.
plot(mod.foreca)
```

The biplot shows that the first component all points in the same direction, which is telling us that this component will be the overall/average pattern. The barplots on the right show how the forecastable components (ForeCs) have indeed decreasing forecastability, and the first component is more forecastable than the original series. In this example, all original series have the same forecastability, as we used the embedding. For general multivariate time series, this is not the case.

Now what do these series look like?

```
mod.foreca$scores <- ts(mod.foreca$scores, start = start(XX),
freq = frequency(XX))
plot(mod.foreca$scores)
```

Indeed, the first component is more forecastable than the original series, since it is less noisy. The remaining series also show very interesting patterns, that are not visible in the original series. Note that all ForeCs are orthogonal to each other, i.e., they are uncorrelated.

```
round(cor(mod.foreca$scores), 3)
## ForeC1 ForeC2 ForeC3 ForeC4
## ForeC1 1 0 0 0
## ForeC2 0 1 0 0
## ForeC3 0 0 1 0
## ForeC4 0 0 0 1
```

The spectrum of each series also gives a good idea of the different ForeCs [sic!] in sunspot activity.

```
spec <- mvspectrum(mod.foreca$scores, "wosa")
plot(spec)
```

If the underlying process is a random walk with or without a drift, then no, you can't get rid of this term.

For instance, if the underlying process (data generating process, or DGP) is as follows:

$$y_t=a+ y_{t-1}+\varepsilon_t,$$
where $\varepsilon_t$ is not correlated with its past and its variance is stable at $\sigma^2$, then you have the following:
$$y_t=a+(a+ y_{t-2}+\varepsilon_{t-1}) +\varepsilon_t=
2a+ y_{t-2}+\varepsilon_{t-1}+\varepsilon_t$$

Let's look at the variance of the error term:
$$Var(\varepsilon_{t-1}+\varepsilon_t)=Var(\varepsilon_{t-1})+Var(\varepsilon_t)=2\sigma^2$$

Basically, your variance will increase linearly with your forecast horizon increasing. You can't get rid of this.

However, do you really know your underlying process (DGP)? Do you know for sure that it's a random walk?

That's where the trick lies in. If for some reason your true DGP is not the one stated above but some variation of it such as ARIMA(0,1,1), you have a chance to slightly decrease the error by estimating the moving average (MA) coefficients of error terms. The error variance is additive only when the errors are uncorrelated. If they're negatively correlated the variance could be slightly smaller (or bigger)

Another "trick" is to misspecify the model as autoregressive (AR). Often it's very hard to differentiate between AR and random walk. So, you could estimate AR process instead. Its error variance is not increasing as in a random walk (differenced) process. Obviously, the reality will show up one day in the form of large forecast (out of sample) errors, but you can keep re-estimating your model, so that it will not be so obvious.

## Best Answer

The general approach is to model the increment process (the first difference of the cumulative sum process) and then take a cumulative sum thereof. Cumulative sums are bound to have unit roots (this goes by definition), and such processes do not lend themselves easily to traditional statistical modelling because they (the processes) are nonstationary. Estimation techniques of statistical models typically require the processes to be stationary, otherwise the estimates of model parameters fail to have their nice properties (consistency, asymptotic normality) which is a hindrance for obtaining good quality forecasts.

So even though it may seem cumbersome, I suggest simply taking the first difference and working with it. The forecast of the first difference can then be added to the last observed value of the original process to obtain the forecast for the original process.