Solved – $\chi^2$ tests to compare the fit of large samples logistic models

chi-squared-testlogisticpsychometrics

Does anyone know of any $\chi^2$ tests to compare the fit of logistic models which factor out the sample size? I'm dealing with a very large sample and I fear the significant $\chi^2$ test I get when adding a single variable to the model is simply the result of the sample size (>200,000 cases). I'm doing what is known as differential item functioning analysis with logistic regression. Basically it's as if I'm checking whether giving the right answer to a question (dependent variable) depends on your ethnicity when controlling for the total exam score.

Model 1 Q1~TotalexamScore

Model 2 Q1~TotalexamScore+ Group

I'm basically using a chi-squared test to compare model1 to model2. The coefficient significance is not that important but $\chi^2$ and sometimes $R^2$ are generally recommended to check differential item functioning. My problem is that my sample is very large. In theory (for the question I'm considering) there should be no real difference across groups, so I suspect it's simply the sensitivity of the $\chi^2$ to sample size.

I'd rather use the whole dataset instead of taking (small) random samples as it is highly skewed. I've seen things like Phi and Cramer's V for crosstabs but I'm not sure whether they have been used before on logistic regression, if there are better ones and if there are any packages (I generally use Spss, Mplus, Stata, R).

Best Answer

One hueristic way you can take account of sample size is to make a random group variable which has the same marginal propensity as your "Group" variable. Then check the chi square statistic for this random group. If it's greater than the chi square for your variable then you have a fair case to dismiss the effect as noise. A more robust version would be to create many noise variables and see if any of their chi square statistics are greater than the chi square for your variable.

Another thing you should do is examine the beta coefficients or "effect sizes" for the "Group" variable. Do they make intuitive sense? For example can you explain why a coefficient should be positive or negative? Can you explain why the magnitude of the coefficient should be bigger or smaller than the other coefficients?

As far as more formal tests go I would recommend BIC as it tends to be conservative. If BIC favours the larger model, then just about any other test will. This usually means "low power" when the sample size is small, but your sample size is large. You can show that using BIC is approximately the same thing as setting the p-value for significance in a likelihood ratio chi square test equal to $Pr(\chi_q^2>q\log[N])$ where $q$ is the number of additional parameters in the larger model.

Related Question