Solved – Confidence interval for chi-square

chi-squared-testconfidence intervalr

I am trying to find a solution to compare two "goodness-of-fit chi-square" tests.
More precisely, I want to compare results from two independent experiments. In these experiments the authors used the goodness-of-fit chi-square to compare random guessing (expected frequencies) with observed frequencies. The two experiments got the same number of participants and experimental procedures are identical, only the stimuli changed. The two experiments results indicated a significant chi-square (exp. 1 : X²(18)=45; p<.0005 and exp. 2 : X²(18)=79; p<.0001).

Now, what I want to do is to test if there is a difference between these two results. I think a solution could be the use of confidence intervals but I don't know how to calculate these confidence intervals only with these results. Or maybe a test to compare effect size (Cohen's w)?

Anyone have a solution?

Thanks a lot!

F.D.

Best Answer

The very limited information you have is certainly a severe constraint! However, things aren't entirely hopeless.

Under the same assumptions that lead to the asymptotic $\chi^2$ distribution for the test statistic of the goodness-of-fit test of the same name, the test statistic under the alternative hypothesis has, asymptotically, a noncentral $\chi^2$ distribution. If we assume the two stimuli are a) significant, and b) have the same effect, the associated test statistics will have the same asymptotic noncentral $\chi^2$ distribution. We can use this to construct a test - basically, by estimating the noncentrality parameter $\lambda$ and seeing whether the test statistics are far in the tails of the noncentral $\chi^2(18, \hat{\lambda}) $ distribution. (That's not to say this test will have much power, though.)

We can estimate the noncentrality parameter given the two test statistics by taking their average and subtracting the degrees of freedom (a methods of moments estimator), giving an estimate of 44, or by maximum likelihood:

x <- c(45, 79)
n <- 18

ll <- function(ncp, n, x) sum(dchisq(x, n, ncp, log=TRUE))
foo <- optimize(ll, c(30,60), n=n, x=x, maximum=TRUE)
> foo$maximum
[1] 43.67619

Good agreement between our two estimates, not actually surprising given two data points and the 18 degrees of freedom. Now to calculate a p-value:

> pchisq(x, n, foo$maximum)
[1] 0.1190264 0.8798421

So our p-value is 0.12, not sufficient to reject the null hypothesis that the two stimuli are the same.

Does this test actually have (roughly) a 5% reject rate when the noncentrality parameters are the same? Does it have any power? We'll attempt to answer these questions by constructing a power curve as follows. First, we fix the average $\lambda$ at the estimated value of 43.68. The alternative distributions for the two test statistics will be noncentral $\chi^2$ with 18 degrees of freedom and noncentrality parameters $(\lambda-\delta, \lambda+\delta)$ for $\delta = 1, 2, \dots, 15$. We'll simulate 10000 draws from these two distributions for each $\delta$ and see how often our test rejects at, say, the 90% and 95% level of confidence.

nreject05 <- nreject10 <- rep(0,16)
delta <- 0:15
lambda <- foo$maximum
for (d in delta)
{
  for (i in 1:10000)
  {
    x <- rchisq(2, n, ncp=c(lambda+d,lambda-d))
    lhat <- optimize(ll, c(5,95), n=n, x=x, maximum=TRUE)$maximum
    pval <- pchisq(min(x), n, lhat)
    nreject05[d+1] <- nreject05[d+1] + (pval < 0.05)
    nreject10[d+1] <- nreject10[d+1] + (pval < 0.10)
  }
}
preject05 <- nreject05 / 10000
preject10 <- nreject10 / 10000

plot(preject05~delta, type='l', lty=1, lwd=2,
     ylim = c(0, 0.4),
     xlab = "1/2 difference between NCPs",
     ylab = "Simulated rejection rates",
     main = "")
lines(preject10~delta, type='l', lty=2, lwd=2)
legend("topleft",legend=c(expression(paste(alpha, " = 0.05")),
                          expression(paste(alpha, " = 0.10"))),
       lty=c(1,2), lwd=2)

which gives the following:

enter image description here

Looking at the true null hypothesis points (x-axis value = 0), we see that the test is conservative, in that it doesn't appear to reject as often as the level would indicate, but not overwhelmingly so. As we expected, it doesn't have much power, but it's better than nothing. I wonder if there are better tests out there, given the very limited amount of information you have available.

Related Solutions

Solved – Alternative to Pearson’s chi-square goodness of fit test, when expected counts < 5

I think you are asking for the "Multinomial Exact Test", which can exactly compute the p-value for whether a multinomial random variable (which takes any of a certain set of values) follows a certain distribution.

R Confidence Interval – How to Find the Difference in Proportions Using Chi-Squared Test

First, let's see if there are differences in the proportion working across the four groups A, B, C, D. (Data similar to yours.)

w = c(90, 32, 9, 3)
nw = c(46, 7 , 8, 5)
TBL = rbind(w, nw)
chisq.test(TBL)

        Pearson's Chi-squared test

data:  TBL
X-squared = 8.7062, df = 3, p-value = 0.03346

Warning message:
In chisq.test(TBL) : 
 Chi-squared approximation may be incorrect

The low cell counts in groups C and D, trigger a warning message, putting the validity of the P-value into doubt. The version of 'chisq.test` implemented in R, allows for simulation of a more accurate P-value, showing a significant effect at the 5% level.

chisq.test(TBL, sim=T)$p.val
[1] 0.03098451

Significance barely at the 5% level does not invite extensive ad hoc tests. To avoid false discovery they should show significance at lower levels. Furthermore, it is not clear just which confidence intervals would be of interest. A look at the Pearson residuals to see if there groups that are strikingly different, possibly suggests comparing groups A and B. However, the level of significance there is unimpressive, especially if we protect against false discovery.

chisq.test(TBL)$resi
         [,1]      [,2]       [,3]      [,4]
w  -0.1173306  1.148334 -0.7081676 -1.019365
nw  0.1671828 -1.636247  1.0090588  1.452480

chisq.test(TBL[,c(1,2)], cor=F)

        Pearson's Chi-squared test

data:  TBL[, c(1, 2)]
X-squared = 3.6176, df = 1, p-value = 0.05717

You have already said you know how to use 'prop.test' to get a 95% confidence interval for the difference of proportions in A and B.

I don't see a point in looking at other pairs of groups---especially not, in view of the low counts there. Maybe you would like to compare group A with the other three groups combined, but 'prop.test' can handle that.

If you had additional kinds of analyses in mind using confidence intervals, please be more specific, and maybe one of us can help.

Best Answer

Related Solutions

Solved – Alternative to Pearson’s chi-square goodness of fit test, when expected counts < 5

R Confidence Interval – How to Find the Difference in Proportions Using Chi-Squared Test

Related Question