ACF Function – How to Calculate the Confidence Interval for the ACF Function

autocorrelationconfidence intervalr

For example, in R if you call the acf() function it plots a correlogram by default, and draws a 95% confidence interval. Looking at the code, if you call plot(acf_object, ci.type="white"), you see:

qnorm((1 + ci)/2)/sqrt(x$n.used)

as upper limit for type white-noise. Can some one explain theory behind this method? Why do we get the qnorm of 1+0.95 and then divide by 2 and after that, divide by the number of observations?

Best Answer

In Chatfield's Analysis of Time Series (1980), he gives a number of methods of estimating the autocovariance function, including the jack-knife method. He also notes that it can be shown that the variance of the autocorrelation coefficient at lag k, $r_k$, is normally distributed at the limit, and that Var($r_k$) ~ 1/N (where N is the number of observations). These two observations are pretty much the core of the issue. He doesn't give a derivation for the first observation, but references Kendall & Stuart, The Advanced Theory of Statistics (1966).

Now, we want α/2 in both tails, for the two tail test, so we want the 1−α/2 quantile.

Then see that (1+1−α)/2=1−α/2 and multiply through by the standard deviation (i.e. square root of the variance as found above)

Related Solutions

Confidence Interval – Determining Sample Sizes for Binomial Confidence Intervals

(1) Yes.

(2) Yes. There are only $n+1$ possible outcomes for a binomial random variable, so it is possible to look at what happens for each possible outcome - in fact this is faster than simulating lots and lots of outcomes!

Let $X$ be the number of "successes" among the $n$ customers and let $\hat{p}=X/n$. The confidence interval is $\hat{p}\pm z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n}$, so the halfwidth is $z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n}$. Thus we want to compute $P(z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n}\leq 0.005)$. In R, we can do this as follows:

target.halfWidth<-0.005

p<-0.016 #true proportion
n.vec<-seq(from=1000, to=3000, by=100) #number of samples

# Vector to store results
prob.hw<-rep(NA,length(n.vec))

# Loop through desired sample size options
for (i in 1: length(n.vec))
{
n<-n.vec[i]

# Look at all possible outcomes
x<-0:n
p.est<-x/n

# Compute halfwidth for each option
halfWidth<-qnorm(0.95)*sqrt(p.est*(1-p.est)/n)

# What is the probability that the halfwidth is less than 0.005?
prob.hw[i]<-sum({halfWidth<=target.halfWidth}*dbinom(x,n,p))
}

# Plot results
plot(n.vec,prob.hw,type="b")
abline(0.95,0,col=2)

# Get the minimal n required
n.vec[min(which(prob.hw>=0.95))]

The answer is $n=2200$ in this case as well.

Finally, it is usually a good idea to verify that the asymptotic normal approximation interval actually gives the desired coverage. In R, we can compute the coverage probability (i.e. the actual confidence level) as:

p<-0.016
n<-2200
x<-0:n
p.est<-x/n
halfWidth<-qnorm(0.95)*sqrt(p.est*(1-p.est)/n)
# Coverage probability
sum({abs(p-p.est)<=halfWidth}*dbinom(x,n,p))

Different $p$ give different coverages. For $p$ around $0.015$, the actual confidence level of the nominal $90\%$ interval seems to be about $89\%$ in general, which I presume is fine for your purposes.

(3) When you sample from a finite population, the number of successes is not binomial but hypergeometric. If the population is large compared to your sample size, the binomial works just fine as an approximation. If you sample 1000 out of 5000, say, it does not. Have a look at confidence intervals for proportions based on the hypergeometric distribution!

Answers to additional questions:

Let $(p_L,p_U)$ be the confidence interval.

1) In that case you are no longer computing $P(p_L-p_U\leq0.01)$ but $$P\Big(p_L-p_U\leq0.01~\mbox{and}~p\in(p_L,p_U)\Big),$$ i.e. the probability that the length of intervals that actually contain $p$ is at most 0.01. This may be an interesting quantity, depending on what you're interested in...

2) Maybe, but probably not. If the population size is large compared to the sample size you don't need it, and if it's not then the binomial distribution is not appropriate to begin with!

3) Sprop seems to contain confidence intervals based on the hypergeometric intervals, so that should work just fine.

Solved – How to interpret this ACF and PACF plot

The ACF and PACF are descriptive statistics showing simple correlation and conditional correlation. They are sometimes useful in identifying ARIMA models that 1) have no Pulses (your series does) and 2) have no deterministic time trends or level shifts ( your series seems to have no trend followed by a period that has trend) and whose parameters and model error variances are constant over time. If you actually post your data in an excel format I will try and help you understand more BUT my initial visual assessment is that this is a time series that may have complications/opportunities that will challenge simple (1960 type !) ARIMA analysis or simple model selection (list-based) schemes that use an AIC/BIC criteria to select/identify a model. We should aspire to keeping things/models simple but not too simple so much so that they are of little value !

EDIT AFTER RECEIPT OF DATA:

I took the 269 monthly values and analyzed it with the automatic option in AUTOBOX. After identifying a global model taking into account various pulse effects a significant difference was found between the first 169 values and the most recent 99 values . This is visually obvious.The best model for the most recent 99 values was and here . The residuals from the model appear well behaved [ enter image description here] . [4] The ACF of the residuals suggest a slight possibility of a minor seasonal effect but most likely not important . . The plot of the actual , fit and forecast using standard ARIMA procedures tp compute uncertainty limits is here . Note however from http://www.autobox.com/cms/index.php/blog/entry/you-should-have-50-confidence-in-your-confidence-limits that the forecast limits are flawed on two accounts. Taking into account the possibility of future values being effected by pulses i.e.unusual values and that the estimated parameters are not necessarily the poulation parameters we get a more realistic picture

Best Answer

Related Solutions

Confidence Interval – Determining Sample Sizes for Binomial Confidence Intervals

Solved – How to interpret this ACF and PACF plot

Related Question