Solved – sample autocovariance function vs autocovariance function

autocorrelationmathematical-statisticsrtime series

What is the difference between both functions? I have been reading, but I cannot see the difference. I suppose that the sample autocovariance function is the estimation of the autocovariance function. When I plot it in R using functions from the library itsmr, I don´t which is which.

Maybe one autocovariance function is theoretical, but we estimate it with the sample autocovariance function.

Best Answer

Maybe one autocovariance function is theoretical, but we estimate it with the sample autocovariance function.

Yes, that's correct. One is based on random data, and the other is theoretical and based on properties of the true model along with its true parameters.

You don't tell us which function you're using in that package, but you can calculate both sample and theoretical autocorrelations in R. Below is a demonstration. Notice that if you re-generate the fake_data, you will get different data, and thus different sample autocorrelations. However, as long as you do not change the true model (an ARMA(2,2) in this case), the theoretical acf will not change.

fake_data <- arima.sim(n = 63, list(ar = c(0.8897, -0.4858), ma = c(-0.2279, 0.2488)),
                        sd = sqrt(0.1796))

acf(fake_data) #sample acf plot
lines(0:17, ARMAacf(ar = c(0.8897, -0.4858), ma =  c(-0.2279, 0.2488), lag.max = 17))  #overlay true

In the itsmr package, which I try not to use whenever I can, the theoretical autocovariance (not autocorrelation) function is calculated by aacvf, while the sample one is calculated by acvf.

Related Solutions

R – Two-Sample One-Sided Kolmogorov-Smirnov Test vs. One-Sided Wilcoxon-Mann-Whitney Test

Both are testing for displacement of the x variable with respect to the y variable, but the 2 tests have opposite meanings for the term "greater" (and therefor also or "less").

In the ks.test "greater" means that the CDF of 'x' is higher than the CDF of 'y' which means that things like the mean and the median will be smaller values in 'x' than in 'y' if the CDF of 'x' is "greater" than the CDF of 'y'. In 'wicox.test' and 't.test' the mean, median, etc. will be greater in 'x' than in 'y' if you believe that the alternative of "greater" is true.

An example from R:

> x <- rnorm(25)
> y <- rnorm(25, 1)
> 
> ks.test(x,y, alt='greater')

        Two-sample Kolmogorov-Smirnov test

data:  x and y 
D = 0.6, p-value = 0.0001625
alternative hypothesis: two-sided 

> wilcox.test( x, y, alt='greater' )

        Wilcoxon rank sum test

data:  x and y 
W = 127, p-value = 0.9999
alternative hypothesis: true location shift is greater than 0 

> wilcox.test( x, y, alt='less' )

        Wilcoxon rank sum test

data:  x and y 
W = 127, p-value = 0.000101
alternative hypothesis: true location shift is less than 0

Here I generated 2 samples from a normal distribution, both with sample size 25 and standard deviation of 1. The x variable comes from a distribution of mean 0 and the y variable from a distribution of mean 1. You can see the results of ks.test give a very significant result testing in the "greater" direction even though x has the smaller mean, this is because the CDF of x is above that of y. The wilcox.test function shows lack of significance in the "greater" direction, but similar level of significance in the "less" direction.

Both tests are different approaches to testing the same idea, but what "greater" and "less" mean to the 2 tests are different (and conceptually opposite).

Solved – Question about sample autocovariance function

$\widehat{\gamma}$ is used to create covariance matrices: given "times" $t_1, t_2, \ldots, t_k$, it estimates that the covariance of the random vector $X_{t_1}, X_{t_2}, \ldots, X_{t_k}$ (obtained from the random field at those times) is the matrix $\left(\widehat{\gamma}(t_i - t_j), 1 \le i, j \le k\right)$. For many problems, such as prediction, it is crucial that all such matrices be nonsingular. As putative covariance matrices, obviously they cannot have any negative eigenvalues, whence they must all be positive-definite.

The simplest situation in which the distinction between the two formulas

$$\widehat{\gamma}(h) = n^{-1}\sum_{t=1}^{n-h}(x_{t+h}-\bar{x})(x_t-\bar{x})$$

and

$$\widehat{\gamma}_0(h) = (n-h)^{-1}\sum_{t=1}^{n-h}(x_{t+h}-\bar{x})(x_t-\bar{x})$$

appears is when $x$ has length $2$; say, $x = (0,1)$. For $t_1=t$ and $t_2 = t+1$ it's simple to compute

$$\widehat{\gamma}_0 = \left( \begin{array}{cc} \frac{1}{4} & -\frac{1}{4} \\ -\frac{1}{4} & \frac{1}{4} \end{array} \right),$$

which is singular, whereas

$$\widehat{\gamma} = \left( \begin{array}{cc} \frac{1}{4} & -\frac{1}{8} \\ -\frac{1}{8} & \frac{1}{4} \end{array} \right)$$

which has eigenvalues $3/8$ and $1/8$, whence it is positive-definite.

A similar phenomenon happens for $x = (0,1,0,1)$, where $\widehat{\gamma}$ is positive-definite but $\widehat{\gamma}_0$--when applied to the times $t_i = (1,2,3,4)$, say--degenerates into a matrix of rank $1$ (its entries alternate between $1/4$ and $-1/4$).

(There is a pattern here: problems arise for any $x$ of the form $(a,b,a,b,\ldots,a,b)$.)

In most applications the series of observations $x_t$ is so long that for most $h$ of interest--which are much less than $n$--the difference between $n^{-1}$ and $(n-h)^{-1}$ is of no consequence. So in practice the distinction is no big deal and theoretically the need for positive-definiteness strongly overrides any possible desire for unbiased estimates.

Best Answer

Related Solutions

R – Two-Sample One-Sided Kolmogorov-Smirnov Test vs. One-Sided Wilcoxon-Mann-Whitney Test

Solved – Question about sample autocovariance function

Related Question