Logistic Regression – Interpreting Results of a Binary Logistic Regression

logisticregression

This is a basic question. I have been handed a binary logistic regression. The model has significant terms, but the goodness of fit tests indicates the logit model is not appropriate. The author of the study indicates that the goodness of fit data does not invalidate the relationship between the dependent variable and the predictors, only the ability of the model to accurately predict outcomes. the argument is that since we were only interested in verifying a relationship and not the magnitude, the result is conclusive.

I'm skeptical of this. Wouldn't it be more appropriate to say that the lack of fit does not necessarily invalidate the relationships? With a different link function, couldn't the observed and expected counts change enough to move some insignificant term to significance or vice versa?

Best Answer

Although changing the link function could change your significance, ignoring issues of multiple testing, only changing to a bad link function will make significance go away - and making the significance go away doesn't make the relationship go away. That is I don't think for a binary respose you can have a significant result that is not real - even if the model is not of the best form.

The terms will only show as significant if they are signficantly explaining variance in the response - they may not be being used in the best possible way to do this, but they are doing something.

One way to see this is to think of your logistic regression not as a model in its own right, but just an arbitrary data transformation. Suppose someone came along and handed you the responses and the binary predictions of the logistic regression (say thresholded at 0.5). You now just have a single binary predictor for a binary response - a $2 \times2$ contingency table. There is no "goodness-of-fit" to worry about - the only possible model is to $Y=X$ and the fit must be good since it is either right or wrong (all structure has been removed by construction!). However, by virtue of the fact that the original logistic regression was significant, it must be the case that the contingency table is significant, there $Y$ is related to $X$. Since $X$ is a function of your original predictor variables, it must also be the case that $Y$ is related to those original predictors.

Of course, the converse is not true. Predictors that were not found to be significant could turn out to have significance if a better fitting model was chosen. Also note that the coefficients of the fit are unlikely to be meaningful. A better fitted model will have very different (most likely larger) strengths of relationship, and may well find additional predictors are important. It is only the simple "yes/no" question of is this predictor related to the dependent variable that must be true even with a poor model fit.

(The above should be caveated that stacking models one after the other like that is a bad idea, and is only meant as a thought experiment.)