Solved – Ordinal Probit model in plain English

ordered-probitregression

Suppose we have a model looking at the association between sodium intake $X$ and levels of body fat $Y$. So $Y$ is an ordinal variable that can take the integer values from $\{0,1,2 \}$. It is ordered according to increasing levels of body fat (e.g. "no bodyfat increase", "very little bodyfat increase", and "a lot of bodyfat increase").

What exactly does an ordinal probit model do? I plotted the data. So is an ordinal probit model trying to fit a line through the data points?

Best Answer

The reason one would use an OP is to study a categorical variables that is ordered, but where the actual values reflect only a ranking. For example, take bond ratings. There's an underlying variable that is unobserved called creditworthiness that some agency has divided into bins, which range from AAA, AA, A, BBB, and so on to D. You can imagine coding these as 12, 11, 10,.... Now AAA is better than AA, and AA is better than A, but the two differences are not equivalent. In your case, AAA is like "strong growth" and D is "no growth".

This unobserved creditworthiness (or BF growth) is function of the explanatory variables (like sodium) and parameters $\beta$ and a normally* distributed error $\varepsilon$. Each bond rating corresponded to a specific range of the creditworthiness. These ranges are not necessarily the same length. Suppose a firm is now at AA and becomes more creditworthy. Eventually, it would pass over the boundary between AA and AAA and the firm would get a new ranking. The ordered probit would estimate the parameters $\beta$ using MLE, together with the values of the boundary (aka cut values) defining the bins of the creditworthiness.

The interpretation of the parameters is a bit tricky since they are identified up to scale only. It's fairly easy to compare a ratio of two parameters to decide which one is more important. For a more involved exercise, you can also take the differences of adjacent cut values and divide by the sodium slope. This tells you the max change in sodium necessary to move out to the next bin. Alternatively, you can also look at the change in probability from being in a specific bin caused a change in sodium.


*If the error has a logistic distribution, you would have the ordered logit instead of the probit.

Related Question