Solved – Autocorrelation and Statistically Independent Samples

autocorrelationconfidence intervalsample-sizetime series

I'm trying to do an error analysis and I was asked to calculate the confidence intervals but was told that I need to calculate the true number of statistically independent samples for doing this. I am not very familiar with statistics whatsoever and really have no idea what I'm supposed to do.

I was told to find the autocorrelation and then determine how far I'll count the data before deciding the data isn't correlated anymore. The data is 40960 samples (a sampling frequency of 4096 Hz for ten seconds) of voltages measured on a load cell in a wind tunnel.

So I did some research and am using the autocorr function on matlab but still don't understand a few things:

1) How many lags I should use?

2) How to use the data from autocorr to find the # of independent samples?

I've included a picture of what Matlab displays after using the autocorr function. The default lag number of the function is 20.

enter image description here

Best Answer

I'm not sure what an "error analysis" is, but I suspect that this all might involve calculating the standard deviation of $\bar{X}$ under two different assumptions.

Case 1: If your data are uncorrelated (or perhaps independent) and all have the same variance, then $$ \operatorname{Var}(\bar{X}) = \frac{\sigma^2}{n} = \frac{\gamma(0)}{n} $$ where $n$ is the sample size, and $\sigma^2 = \gamma(0) = \operatorname{Var}(X_i)$.

Case 2: If your data are a stationary time series with mean $\mu$ and absolutely summable autocovariance function $\gamma(\cdot)$, then $$ n\operatorname{Var}(\bar{X}) \to \sum_{j=-\infty}^{\infty}\gamma(j) = \gamma(0) + 2\sum_{j=1}^{\infty}\gamma(j), $$ or approximately the variance of $\bar{X}$ is $\sum_{j=-\infty}^{\infty}\gamma(j) /n$. See here for more details.

Effective sample size refers to solving an equation like the following equation for $n_{\text{eff}}$ $$ \frac{\hat{\gamma}(0) + 2\sum_{j=1}^{B}\hat{\gamma}(j)}{n} = \frac{\hat{\gamma}(0)}{n_{\text{eff}}} \tag{1}. $$ where $B$ is some big number you pick (you can't sum an infinite number autocovariances, so you must approximate this). You have $n$ samples, but your samples are correlated. So solving this equation for $n_{\text{eff}}$ gives you the hypothetical sample size that you would need to have the same standard error with iid samples. If your data are very correlated, $n_{\text{eff}}$ turns out to be very low, and so this gives you an idea of how inefficient your estimator is. Take care not to pick $B$ to be too small; it is likely too small if increasing it slightly drastically changes the sum. You may look at the cumulative sums, and you should pick $B$ large enough so that it looks like it has stabilized.

Related Question