Solved – Summing variance of autocorrelated timeseries

autocorrelationcovariancertime seriesvariance

Problem: When trying to calculate the variance of timeseries sums I get a negative variance, mostly due to autocovariances at large lag steps. Does not seem realistic.

I have a timeseries which is calculated from another timeseries using a regression equation.
I would like to propagate the uncertainty in the regression to the final timeseries. Then I want to sum (or take mean values) different segments of the timeseries over different timeperiods, and get the uncertainty of the sums. The timeseries is originally in 1 hour frequency and I want to sum over periods of 1 day (resampling to daily frequency) up to several years. The timeseries is strongly autocorrelated at short lag times.

For getting the variance of the sum (in the case of 3 elements being summed):
$$Var(a+b+c)= \\ Var(a)+Var(b)+Var(c) + 2 \times (Cov(a,b) + Cov(a,c)+Cov(b,c))$$

I use r for the calculations. I get the variances for each timeseries element as $SE^2$, where $SE$ is the standard error (se.fit) returned from r's predict() function using the regression model. The covariances I get from the autocovariance function acf().

Here is some code and a selection of the data (excuse clumsy R code, I'm very new to R):

#tsY is the predicted timeseries from the regression
tsY=c(81.4,  79.0,  83.4,   81.7,   75.7,   68.3,   62.3,   57.2,   52.6,   48.8,   45.4,   42.6,   39.9,   37.6,   35.6,   33.8,   32.2,   30.8,   29.6,   28.4,   27.3,   26.2,   25.0,   23.9)
#tsSE is the standard error from the prediction (se.fit)
tsSE=c(1.55,  1.49, 1.60,   1.56,   1.41,   1.23,   1.09,   0.97,   0.87,   0.78,   0.71,   0.65,   0.60,   0.55,   0.51,   0.48,   0.45,   0.42,   0.40,   0.38,   0.36,   0.34,   0.32,   0.30)

tsVar=tsSE^2

#create a matrix of the autocovariances at different lag times, diagonal is lag=0
#rows and columns are indicies in timeseries
covmat<-matrix(numeric(0), length(tsY),length(tsY)) 
for ( i in (1:(length(tsY)) ) ) {
  if (i == 1) {
    autocov<-acf(tsY, type='covariance', lag.max= length(tsY))
    autocovvec<-autocov$acf[1:nrow(autocov$acf)]
    covmat[i:length(tsY),i]=autocovvec
  }  else {
    autocov<-acf(tsY[-(1:i-1)], type='covariance', lag.max= length(tsY))
    autocovvec<-autocov$acf[1:nrow(autocov$acf)]
    covmat[i:length(tsY),i]=autocovvec
  }

}

# sum the matrix columns, but not the diagonal
sumofColumns <- rep(NA, ncol(covmat))
for (i in (1:ncol(covmat))) {
  if (i == 1) {
    sumofColumns[i]=sum(covmat[-(1),i])  
  } else{ 
    sumofColumns[i]=sum(covmat[-(1:i),i])  
  }
}

sumofCov=sum(sumofColumns) # sum of the covariance (Cov(a,b) + Cov(a,c)+...)
sumofVar=sum(tsVar) # sum of the variances of each timeseries element
varofSum=sumofVar+2*sumofCov # variance of the sum of the timeseries

# from the covmat the negative variance occurs at larger lag times.
acf(tsY, type='covariance', lag.max= length(tsY))

> sumofCov
[1] -1151.529
> varofSum
[1] -2283.246

So I have the following questions:

  1. Did I completely misunderstand how to calculate variance of sums?

  2. Is it better to use a cutoff from the max lags to be considered in the autocovariance? If so how would one determine this? This would especially be important with the complete data where the length is several thousand.

3. Why is the covariance negative in this sample data at large? When plotting tsY plot(tsY) it looks like the covariance/correlation should remain positive. Because it is the variation in direction from their means.

EDIT:

Comment on question 2 above:
I have realized that using n-1 lags, as above in the code, does not make a lot of sense. There appear to be few different ways to determine the maximum lags to consider. Box & Jenkins (1970) suggest n/4 and R by default 10*log10(n). This does not answer the question however, of how to determine an appropriate cutoff for summing the covariances.

Does it make sense to look at the partial autocorrelation (function pacf()), in order not to overestimate the effect of the auto covariance in the summation term? The partial autocorrelation for my data is significantly different from zero only at 1 or 2 lags. Similarly, fitting an AR model using ar() function, I also get an order of 1 or 2.

Cheers

Related post Variance on the sum of predicted values from a mixed effect model on a timeseries

Best Answer

While your approach is not problematic per se, the problem of (non-)positive definite matrices of (linear) functions of autocorrelated time series has been recognized as a small sample problem, and a solution was offered by econometricians Newey and West (1987): the contributions of the distant lags should be downweighted. They proposed the weights (kernel function) $$ w_l = 1 - \frac{l}{L+1} $$ for the $l$-th, out of $L$, lags. Other forms of the kernels were proposed later, as well.

Related Question