I've been having struggling with the analysis of ordinal categorical data that I have (against 3 binary categorical explanatory variables). The problem is that for one of my data sets,running ordinal logistic regression in SPSS gives an unusual result, which raises questions about the validity of this statistical testing approach to my data:
Full SPSS syntax and output can be accessed here
The question arises: why is no statistical result given for the non-redundant level of the age variable, StErr = 0, confidence interval does not exist, logit Estimate so high…? Exponentiation of logit gives some ridiculously high odds ratio… It’s like a perfect result with 100% certainty? Is this statistical test and outcome valid? Can someone explain what is happening in layman's terms? Should I be using a different statistical test than ordinal logistic regression? And please suggest how I should analyse and report it for a scientific publication.
The raw data can be viewed for further insight:
I have also made available the SPSS data file in case you wanted to run the test for yourself:
I have also attached a visualisation of the data in the form of stacked columns, to help you to quickly familiarise yourself with the character of my data.
Best Answer
The problem here is what is called separation. For one level of your predictor (age) you have zero occurrences of one level of your outcome (collagen). The program is driving its estimate of the coefficient upwards to infinity and in this case stopping at an arbitrary value which as you point out when exponentiated gives you a very large odds ratio. What you do about it is really a substantive question to do with the underlying science rather than a statistical one.
Added as edit
You ask what separation means. In logistic regression it means that for some value of the predictor (or some linear combination of the predictors) only one value of the outcome occurs. You have a variant of this because you are using an ordinal model rather than logistic regression. What you do is a matter of your underlying science. Finding a perfect predictor may be really desirable for you or you may feel that this is just a chance finding caused by your sample size. There are models for logistic regression (exact logistic regression, Firth's bias corrected method) but I do not know whether they are available for ordinal models.
Further addition
Investigating this further with two R packages (I do not use SPSS) reveals that they agree on there being a problem but give slightly different output. The function
polr
from MASS basically agrees with the output in the original question and inspecting the variance-covariance matrix of the coefficients shows that the estimated correlation between age and one of the intercepts is close to 1 in absolute value.The function
clm
from the ordinal package rejects the model on the grounds that the Hessian is singular. Using @FrankHarrell suggestion to investigate LRT we find that age is indeed significant.Note the a LRT of age is possible here.
My feeling is that there is insufficient information here in the data-set to fit the model with any confidence. If we had a single observation for age == old & coll == 0 then both functions would fit the model (and give identical results) so the problem here is the zeroes for both (age == old & coll == 0) & (age == young & coll == 2).