Solved – Checking the proportional odds assumption holds in an ordinal logistic regression using polr function

assumptionslogisticordered-logitpolrr

I have used the ‘polr’ function in the MASS package to run an ordinal logistic regression for an ordinal categorical response variable with 15 continuous explanatory variables.

I have used the code (shown below) to check that my model meets the proportional odds assumption following advice provided in UCLA's guide. However, I’m a little worried about the output implying that not only are the coefficients across various cutpoints similar, but they are exactly the same (see graphic below).

FGV1b <- data.frame(FG1_val_cat=factor(FGV1b[,"FG1_val_cat"]), 
                    scale(FGV1[,c("X","Y","Slope","Ele","Aspect","Prox_to_for_FG", 
                          "Prox_to_for_mL", "Prox_to_nat_border", "Prox_to_village", 
                          "Prox_to_roads", "Prox_to_rivers", "Prox_to_waterFG", 
                          "Prox_to_watermL", "Prox_to_core", "Prox_to_NR", "PCA1", 
                          "PCA2", "PCA3")]))
b     <- polr(FG1_val_cat ~ X + Y + Slope + Ele + Aspect + Prox_to_for_FG + 
                            Prox_to_for_mL + Prox_to_nat_border + Prox_to_village + 
                            Prox_to_roads + Prox_to_rivers + Prox_to_waterFG + 
                            Prox_to_watermL + Prox_to_core + Prox_to_NR, 
              data=FGV1b, Hess=TRUE)

View a summary of the model:

summary(b)
(ctableb <- coef(summary(b)))
q        <- pnorm(abs(ctableb[, "t value"]), lower.tail=FALSE) * 2
(ctableb <- cbind(ctableb, "p value"=q))

And now we can look at the confidence intervals for the parameter estimates:

(cib <- confint(b)) 
confint.default(b)

But these results are still quite hard to interpret, so let's convert the coefficients into odds ratios

exp(cbind(OR=coef(b), cib))

Checking the assumption. So the following code will estimate the values to be graphed. First it shows us the logit transformations of the probabilities of being greater than or equal to each value of the target variable

FG1_val_cat <- as.numeric(FG1_val_cat)
sf <- function(y) {
  c('VC>=1' = qlogis(mean(FG1_val_cat >= 1)),
    'VC>=2' = qlogis(mean(FG1_val_cat >= 2)),
    'VC>=3' = qlogis(mean(FG1_val_cat >= 3)),
    'VC>=4' = qlogis(mean(FG1_val_cat >= 4)),
    'VC>=5' = qlogis(mean(FG1_val_cat >= 5)),
    'VC>=6' = qlogis(mean(FG1_val_cat >= 6)),
    'VC>=7' = qlogis(mean(FG1_val_cat >= 7)),
    'VC>=8' = qlogis(mean(FG1_val_cat >= 8)))
}
(t <- with(FGV1b, summary(as.numeric(FG1_val_cat) ~ X + Y + Slope + Ele + Aspect + 
                             Prox_to_for_FG + Prox_to_for_mL + Prox_to_nat_border + 
                             Prox_to_village + Prox_to_roads + Prox_to_rivers + 
                             Prox_to_waterFG + Prox_to_watermL + Prox_to_core + 
                             Prox_to_NR, fun=sf)))

The table above displays the (linear) predicted values we would get if we regressed our dependent variable on our predictor variables one at a time, without the parallel slopes assumption. So now, we can run a series of binary logistic regressions with varying cutpoints on the dependent variable to check the equality of coefficients across cutpoints

par(mfrow=c(1,1))
plot(t, which=1:8, pch=1:8, xlab='logit', main=' ', xlim=range(s[,7:8]))

polr assumption check

Apologies that I am no statistics expert and perhaps I am missing something obvious here. However, I have spent a long time trying to figure out if there is a problem in how I tested the model assumption and also trying to figure out other ways to run the same kind of model.

For example, I read in many help mailing lists that others use the vglm function (in the VGAM package) and the lrm function (in the rms package) (for example see here: Proportional odds assumption in ordinal logistic regression in R with the packages VGAM and rms). I have tried to run the same models but am continuously coming up against warnings and errors.

For example, when I try to fit the vglm model with the ‘parallel=FALSE’ argument (as the previous link mentions is important for testing the proportional odds assumption), I encounter the following error:

Error in lm.fit(X.vlm, y = z.vlm, …) : NA/NaN/Inf in 'y'
In addition: Warning message:
In Deviance.categorical.data.vgam(mu = mu, y = y, w = w, residuals = residuals, :
fitted values close to 0 or 1

I would like to ask please if there is anyone who might understand and be able to explain to me why the graph I produced above looks as it does. If indeed it means that something isn’t right, could you please help me find a way to test the proportional odds assumption when just using the polr function. Or if that is just not possible, then I will resort to trying to use the vglm function, but would then need some help to explain why I keep getting the error given above.

NOTE: As a background, there are 1000 datapoints here, which are actually location points across a study area. I am looking to see if there are any relationships between the categorical response variable and these 15 explanatory variables. All of those 15 explanatory variables are spatial characteristics (for example, elevation, x-y coordinates, proximity to forest etc.). The 1000 datapoints were randomly allocated using a GIS, but I took a stratified sampling approach. I made sure that 125 points were randomly chosen within each of the 8 different categorical response levels. I hope this information is also helpful.

Best Answer

The dependent variable has 8 ordered levels so in the graph to test the proportional odds assumption you should see 8 different symbols for every independent variable. You see only 2 symbols for every independent variable probably because you have chosen a too short interval for the values of the x-axis. If my conjecture is correct, you just need to use a wider interval for the values of the x-axis. Try this code:

par(mfrow=c(1,1))
plot(t, which=1:8, pch=1:8, xlab='logit', main=' ', xlim=range(s[,3:9]))
Related Question