Solved – Use likelihood-ratio test to select models in case of nested models

likelihood-ratiomodel selection

Is it possible to do model selection in this way? Suppose I need to select a good (logistic) model among three variables (var1, var2, var3). The deviance D* (-2*log-likelihood) of this full model would be the minimum among all possible models. Then I could try all 6 combination of sub-models(1,2,3,12,13,23) and compute their deviance D1~D6. Next I compute the difference: deltaD_i=D_i-D*, this should follow chi-square distribution with df=differences_in_variables_numbers. The models with deltaD within 95% confidence interval of D* would be within the confidence interval of the full model, that is, the variance explained by the reduced model is not significantly different than the full model. Then we could accept these model as good models. By doing this, we could end of with several "good" models.

Is this somehow a possible way to do model selection?

Thanks

Best Answer

Normally looking at all subsets as you do in your example is not done because there are so many. In your example with just three variables it can be done and would usually be done. Rather than use the deviance which gives the smallest value to the full model, criteria like AIC or BIC are used as they penalize the loglikelihood for using many parameters. Models are chosen that minimize the criterion (e.g. AIC, BIC). Looking at variance explained is using R square as the criterion. Minimizing adjusted R square is sometimes used in regression for the same reason as the peanlized likelihood measures (i.e. R square is maximized when using the full model).

Your idea that you decide a model is good if the extra variance explained by the full model is not statistically significantly higher than with the given model is a sensible way to call a subset of models "good".

Related Question