Solved – Factor or ordered factor

categorical datarregression

I don't understand when I should transform a qualitative ordinal independent variable into an unordered factor and when I should transform it into an ordered factor instead, when I'm performing a regression in R.
For example, let's consider these explanatory variables:

  • day of the week (Monday,…, Sunday): this is a qualitative variable. It is also ordinal because: Monday < Tuesday < … < Sunday.
  • month: this is a qualitative variable. It is also ordinal because: January < February < … < December.
  • year: this is a qualitative variable. But it is also ordinal because there's a natural ordering between years.
  • education (with levels "Low","Medium","High"): this is a qualitative variable. It is also ordinal because: Low < Medium < … < High.

From a statistical point of view, these variables are all qualitative ordinal, however sometimes they are treated as unordered factors in R. Why? Is there a general rule which lets me know when I should treat a qualitative ordinal explanatory variable as an ordered factor and when I shouldn't in a regression problem?
Thank you in advance.

Best Answer

It depends! Sometimes you would like to treat the levels of a categorical variable in a quantitative way, sometimes not.

Lets take your last example, and say that you would like to predict math abilities from length of education, and lets assume that every year you have math education your abilities improve linear.

If every education level stands for one year of education, than one could fit a simple linear model. Coding the level as a quantitative variable with the values 1, 2, 3 would make perfect sense.

But lets suppose that level 1 represents 2 years of education, level 2 6 years and level 3 4 years. The 1, 2, 3 coding would now make less sense. A coding of 2, 8, 12 would make more sense. But instead of making all these assumptions, it would just be smarter to make linear model with education modeled qualitatively using dummy coding.

To sum up. If a qualitative predictor can be imagined as being on a rational scale, then it is reasonable to treat it as a quantitative predictor. If the predictor can only be imagined on an interval scale, then one could consider the same; but usually this involves a lot of assumptions. In this case it might be better represent it as qualitative.

HTH.

Related Question