Solved – $\chi^2$ tests to compare the fit of large samples logistic models

chi-squared-testlogisticpsychometrics

Does anyone know of any $\chi^2$ tests to compare the fit of logistic models which factor out the sample size? I'm dealing with a very large sample and I fear the significant $\chi^2$ test I get when adding a single variable to the model is simply the result of the sample size (>200,000 cases). I'm doing what is known as differential item functioning analysis with logistic regression. Basically it's as if I'm checking whether giving the right answer to a question (dependent variable) depends on your ethnicity when controlling for the total exam score.

Model 1 Q1~TotalexamScore

Model 2 Q1~TotalexamScore+ Group

I'm basically using a chi-squared test to compare model1 to model2. The coefficient significance is not that important but $\chi^2$ and sometimes $R^2$ are generally recommended to check differential item functioning. My problem is that my sample is very large. In theory (for the question I'm considering) there should be no real difference across groups, so I suspect it's simply the sensitivity of the $\chi^2$ to sample size.

I'd rather use the whole dataset instead of taking (small) random samples as it is highly skewed. I've seen things like Phi and Cramer's V for crosstabs but I'm not sure whether they have been used before on logistic regression, if there are better ones and if there are any packages (I generally use Spss, Mplus, Stata, R).

Best Answer

One hueristic way you can take account of sample size is to make a random group variable which has the same marginal propensity as your "Group" variable. Then check the chi square statistic for this random group. If it's greater than the chi square for your variable then you have a fair case to dismiss the effect as noise. A more robust version would be to create many noise variables and see if any of their chi square statistics are greater than the chi square for your variable.

Another thing you should do is examine the beta coefficients or "effect sizes" for the "Group" variable. Do they make intuitive sense? For example can you explain why a coefficient should be positive or negative? Can you explain why the magnitude of the coefficient should be bigger or smaller than the other coefficients?

As far as more formal tests go I would recommend BIC as it tends to be conservative. If BIC favours the larger model, then just about any other test will. This usually means "low power" when the sample size is small, but your sample size is large. You can show that using BIC is approximately the same thing as setting the p-value for significance in a likelihood ratio chi square test equal to $Pr(\chi_q^2>q\log[N])$ where $q$ is the number of additional parameters in the larger model.

Related Solutions

Solved – Why is power analysis with logistic regression so liberal compared to chi squared

The two tests (logistic regression and chi-square) are equivalent and a power analysis should give the same answer.

You are assuming that a value of 0.15 for f2 and w are the same effect size, they're not. A small value of w is 0.1, a small value of f2 is 0.02.

cohen.ES(test=c("chisq"), size=c("small"))
cohen.ES(test=c("f2"),    size=c("small"))

Edit: Elaborated on the similarity of the two approaches.

IF you give the same data to logistic regression and a chi-square test (strictly: without Yates' correction), you get the same result. Here's an example

> set.seed(1234)
> x <- rbinom(100, 1, 0.2) 
> y <- rbinom(100, 1, 0.2) 
> chisq.test(table(x, y), correct=FALSE)

    Pearson's Chi-squared test #'

data:  table(x, y)
X-squared = 0.155, df = 1, p-value = **0.694**

Warning message:
In chisq.test(table(x, y), correct = FALSE) :
  Chi-squared approximation may be incorrect
> summary(glm(y ~ x, family="binomial"))

Call:
glm(formula = y ~ x, family = "binomial")

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-0.753  -0.753  -0.753  -0.668   1.794  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -1.114      0.251   -4.43  9.4e-06 ***
x             -0.272      0.693   -0.39     **0.69**    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 110.22  on 99  degrees of freedom
Residual deviance: 110.06  on 98  degrees of freedom
AIC: 114.1

Number of Fisher Scoring iterations: 4

The p-values are the same, so the power should be the same. I can't remember the formulas for the two different versions of the effect size. Effect size measures are a little weird because in the old days you wanted to minimize the number of tables that you put into books (so we have, for example, $f^2$ instead of $R^2$, when there's a direct relationship between them, and $R^2$ is what everyone understands).

Solved – Test if two samples follow the same distribution with Chi Squared in R

Given some set of cutpoints, the two-sample case becomes a chi-squared test of homogeneity of proportions (and this in turn is functionally identical to a test of independence in a $2\times k$ table).

How do I go about binning two samples using the same intervals?

choose some set of bins (if possible without reference to the data, though in practice that may be difficult to accomplish unless you know beforehand what the distribution is roughly going to be)
for each sample count the data in those bins

(in R you could use the cut function for setting up the bins and the table function for counting - but it's far from the only choice. If you really wanted to get hist to choose your bins then I'd combine the two samples into one for identifying your cut-offs, but then you still have to go back and do the counts for the individual samples; it may also leave you with some small expected counts, but if you work with just the marginal distribution you can at least combine bins that way without looking at how the individual counts would have split up)

A worked example:

set.seed(7687120)                # make sure we look at the same numbers
x=rgamma(40,6,1/6)               # generate some x,y data 
y=rgamma(30,9,1/5)               # from different distributions
xy=c(x,y)                        # combine into one sample
hist(xy)                         # default hist bins not really suitable 
summary(xy)
hist(xy),breaks=seq(15,105,15))  # some small category counts at the top end
bks=c(15,30,45,60,105)           # -- push together everything above 60

table(cut(xy,breaks=bks))        # marginal totals look reasonable to me 
xc=table(cut(x,breaks=bks))      # calc. individual counts in table for x
yc=table(cut(y,breaks=bks))      # corresponding counts for y
rbind(xc,yc)                     # what the table looks like
chisq.test(rbind(xc,yc))         # testing the result

Best Answer

Related Solutions

Solved – Why is power analysis with logistic regression so liberal compared to chi squared

Solved – Test if two samples follow the same distribution with Chi Squared in R

Related Question