Solved – Ordered Probit and categorical variables

categorical dataordered-logitordered-probitprobit

I wanted to run a quick and easy (or so I thought) regression on some data I have but now I am starting to doubt whether or not the regression makes any sense. I have seen some similar questions but they don't really answer what I want to know.

I want to run an ordered probit/logit with the dependent variable having five ordinal outcomes. My explanatory variables are: age, work experience, gender, position with-in the organisation, country of origin and so on.

"Gender" will be a dummy variable and "Country of origin" will be coded using dummmies.

"Age" is a categorical variable so it will have values such as 18-25 years, 25-32 years etc. Should I code each group as a dummy variable?

"Position with-in the organisation" is categories such as employee, executive and so on. Should I code each of these as dummy variables?

Work experience are in categories such as 1-2 years, 2-3 years and 5+ years etc. Should I code these as dummy variables?

What I am interested in knowing is that when I only have dummy variables as explanatory variables, will I get meaningful results? For some reason I find it very unintuitive.

Best Answer

Results from an ordered logit/probit regression are always unintuitive, but categorical explanatory variables are as meaningful as continuous ones. I'd even say that they are easier to interpret.

For a concrete example, you could look at Dobson, An Introduction to Generalizer Linear Models, 2002, 2nd ed., Chapter 8. In her "car preferences" example, the dependent variable is the importance of air conditioning and power steering (three levels: "no or little importance", "important", "very important") and the two explanatory variables are gender (male or female, coded as 1 and 0) and age (18-23, 24-40, >40, coded as age2440 = 1 or 0, and agegt40 = 1 or 0).

Fitting an ordered probit model you get (I've used R, MASS library, polr() function):

Coefficients:
   male age2440 agegt40 
-0.3467  0.6817  1.3288 

Intercepts:
  NoImp|Imp Imp|VeryImp 
    0.01844     0.97594 

Then you can compute the probabilities for women (male = 0) over 40 (age2440 = 0, agegt40 = 1):

NoImp     Imp VeryImp 
0.095   0.267   0.638

and for men over 40 (male = 1):

NoImp     Imp VeryImp 
0.168   0.330   0.502 

Their difference is the gender partial effect:

 NoImp     Imp VeryImp 
-0.073  -0.063   0.136

I think that it's meaningful ;-)