Solved – Stepwise regression for ordinal dependent variable with 3 levels

categorical dataordered-logitstepwise regression

I would like to perform a stepwise regression on a number of continuous independent variables to determine which ones best predict the dependent variable. The dependent variable is ordinal in nature with 3 categories. How can I do this, or do I have to somehow just dichotomize my DV so I can do a logistic stepwise regression?

Best Answer

Regarding the model: Don't make dummies out of your ordinal dependent. You need to use an ordinal logistic regression model. Its hard to fully answer without more details on your data or which statistical package you use. If your dependent was categorical you would use a multinominal logistic regression model. This is a decent tutorial on fitting and interpreting the ordinal model in R. Edit: Ordinal logistic regression with SAS, and Interpreting ordinal logistic output in SAS.

Regarding stepwise regression: Note that in order to find which of the covariates best predicts the dependent variable (or the relative importance of the variables) you don't need to perform a stepwise regression. You need standardized coefficients. In R you can do it using the scale() function on your data set, but all statistical packages have equal (or easier) mechanisms. Comparing the size of the standardized coefficients will give you the answer. Using stepwise regression will help you understand which model is most economic in that it incorporates only those which benefit the model. However, it is not a very recommended method as it may not find the best model. You might prefer to use theoretical considerations.

Edit: regarding explained percent variance: If the previous method of finding relative importance is not good enough and you need the explained percent of the variance per variable, you are sadly out of luck. $R^2$ does not exist for logistic models. See This explanation for more details on pseudo $R^2$ From the UCLA stat help (from which all links here are taken):

The model estimates from a logistic regression are maximum likelihood estimates arrived at through an iterative process. They are not calculated to minimize variance, so the OLS approach to goodness-of-fit does not apply. However, to evaluate the goodness-of-fit of logistic models, several pseudo R-squareds have been developed. These are "pseudo" R-squareds because they look like R-squared in the sense that they are on a similar scale, ranging from 0 to 1 (though some pseudo R-squareds never achieve 0 or 1) with higher values indicating better model fit, but they cannot be interpreted as one would interpret an OLS R-squared and different pseudo R-squareds can arrive at very different values