Solved – Independent variables in ordinal logistic regression

categorical datalogisticordered-logitregression

One of my IV's for my ordinal logistic regression is a nominal categorical variable with 4 categories. Most examples I see for this type of logistic regression have only binary categorical variables. I have tried with and without dummies for my 4-category independent… but I am not confident of the outcome. When I renumber the categories (switch them around as 1,2,3,4 because they are not ordered in any way, I get different parameters and p-values each time. So something is wrong.

Best Answer

Let's think about regular linear regression, and to make it concrete, let's say we are trying to predict height of people. When you regress heights against just an intercept term and no predictors, the intercept term will be be the height averaged over all the people in your sample. Lets call this term $\beta_0^{\text{no predictor}}$

Now, we want to add a predictor for sex, so we create and indicator variable that takes a 0 when the sampled person is male and 1 when the person is a female. When we regress against this model, we will get an estimates for an intercept term, $\beta_0^{\text{male reference}}$ and coefficent of the sex variable $\beta_1^{\text{male reference}}$. The estimated intercept is no longer the average height of everybody, but the average height of males, the coefficient of the sex variable is the difference in the average height between males and females.

Consider if we decided to code our indicator variable differently, so that the sex variable took the value 0 if the person was a female and 1 if the person was a male, in this specification of the model we get the estimates of the intercept and coefficient $\beta_0^{\text{female reference}}, \beta_1^{\text{female reference}}$. Now $\beta_0^{\text{female reference}}$, the intercept term, is the average height of females, and the coefficient is the difference in average height between females and males. So

$$ \begin{align} \beta_1^{\text{male reference}} &= -\beta_1^{\text{female reference}}\\ \beta_0^{\text{male reference}} + \beta_1^{\text{male reference}} &= \beta_0^{\text{female reference}}\\ \beta_0^{\text{female reference}} + \beta_1^{\text{female reference}} &= \beta_0^{\text{male reference}} \end{align} $$

So, by changing how we coded the indicator variable we changed both the value of the intercept term the coefficient term, and this is exactly what we should want. When we have a multivalue indicator, you will see the same kinds of changes as you specify difference reference levels, i.e. when the indicators take on the value of 0.

In the binary indicator case the p-value of the $\beta_1$ term should not change depending on how we code, but in the multivalue indicator case it will, because p-value is a function of the size of the effect, and the average differences between groups and a reference group will likely change dependent upon the reference group. For example, we have three groups, babies, teenagers, and adults, the average height difference between adults and teenagers will be smaller than between adults and babies, and so the p-value for the coefficient for the indicator of being an adult versus a teenager should be greater than an indicator of being an adult versus a baby.

Related Question