I'm trying to do an ordered logit regression. I'm running the model like so (just a dumb little model estimating number of firms in a market from income and population measures). My question is about predictions.
nfirm.opr<-polr(y~pop0+inc0, Hess = TRUE)
pr_out<-predict(nfirm.opr)
When I run predict (which I'm trying to use to get the predicted y), the outputs are either 0, 3, or 27, which in no way reflects what should seem to be the prediction based upon my manual predictions from the coefficient estimates and intercepts. Does anyone know how get "accurate" predictions for my ordered logit model?
EDIT
To clarify my concern, my response data has observations across all the levels
>head(table(y))
y
0 1 2 3 4 5
29 21 19 27 15 16
where as my predict variable seems to be bunching up
> head(table(pr_out))
pr_out
0 1 2 3 4 5
117 0 0 114 0 0
Best Answer
To manually verify the predictions derived from using
polr()
from packageMASS
, assume a situation with a categorical dependent variable $Y$ with ordered categories $1, \ldots, g, \ldots, k$ and predictors $X_{1}, \ldots, X_{j}, \ldots, X_{p}$.polr()
assumes the proportional odds model$$ \text{logit}(p(Y \leqslant g)) = \ln \frac{p(Y \leqslant g)}{p(Y > g)} = \beta_{0_g} - (\beta_{1} X_{1} + \dots + \beta_{p} X_{p}) $$
For possible choices implemented in other functions, see this answer. The logistic function is the inverse of the logit-function, so the predicted probabilities $\hat{p}(Y \leqslant g)$ are
$$ \hat{p}(Y \leqslant g) = \frac{e^{\hat{\beta}_{0_{g}} - (\hat{\beta}_{1} X_{1} + \dots + \hat{\beta}_{p} X_{p})}}{1 + e^{\hat{\beta}_{0_{g}} - (\hat{\beta}_{1} X_{1} + \dots + \hat{\beta}_{p} X_{p})}} $$
The predicted category probabilities are $\hat{P}(Y=g) = \hat{P}(Y \leq g) - \hat{P}(Y \leq g-1)$. Here is a reproducible example in R with two predictors $X_{1}, X_{2}$. For an ordinal $Y$ variable, I cut a simulated continuous variable into 4 categories.
Now fit the proportional odds model using
polr()
and get the matrix of predicted category probabilities usingpredict(polr(), type="probs")
.To manually verify these results, we need to extract the parameter estimates, from these calculate the predicted logits, from these logits calculate the predicted probabilities $\hat{p}(Y \leqslant g)$, and then bind the predicted category probabilities to a matrix.
Compare to the result from
polr()
.For the predicted categories,
predict(polr(), type="class")
just picks - for each observation - the category with the highest probability.Compare to result from
polr()
.