The fact that these are coefficients are represented entirely by factors in R means that the Intercept is the log-odds for the event, i.e log(the proportion with event / proportion without) for subjects who all have their factor values at the lowest level. We know that of the 1615 in level 1 of the factor under scrutiny, 1088 survived, although 1088/(1615-1088) (= odds of survival given factor =1) is not necessarily going to match up with exp(Intercept) since not all of those people also had the other factors at the lowest level. In fact the odds of survival (or it could be death, since it's not really been made clear what the coding for the events was) was quite different. At the best case of all factors being = 1 the odds of survival were:
exp( 4.2284770 )
[1] 68.61266
That's actually a pretty low odds for newborn survival, so this must have been a NICU study or something happening in a third world county. But its way higher than the odds for all children who had that value and any other values for the other predictors. (When looking at the single line of data it was neither ... it was on a different species.) In R the way to get a quick estimate of survival probability would be:
(pred1s <- predict(mod, newdata=as.data.frame( with( lesna,
list( seas = 1 ,
btw=1 , prectem5 =1 ,
pcscore =1 , pindx5 =1 ,
presp2= 1 , ppscore= 1,
mtone2 = , fos = ,
psex = 1 , pscolor = 1,
pshiv =1 , backfat5= 1 ,
srect2 =levels(srect2)[1] , gest3 =1 ,
int3= 1 , agit= 1 , tacc =1 ) )
),
type="response" )
) # the outer parens are to get a value to print when the assignment is made
You can the compare exp(Intercept) to that ( atypical ) prediction divided by (1-prediction) ... since that what odds are. So I hope it's becoming clear. You need to specify all of the factor level values at once to get a prediction. Or you need to create a synthetic cohort with a specific composition of all factor levels. You cannot take a single factor distribution and create a prediction from such a complex model unless you specify some sort of average value for all the other factors.
Edit after looking at the single line of data:
Some (most in fact) of those variable were not factors, (and I guess I should have recognized that), which means they do not have any levels in your dataframe, but do have levels in the model matrix. I had assumed that the levels would be attributes of the factor variable, but I was wrong. It's going to be easier to work on this model if the structure of your 'newdata' arguments given to predict
have the same structure as the original dataframe and I would seriously consider making a copy and turning all those items into factors. But with the exception of "srect2" we can change all of those items to 1. at least under the assumption that that is the minimum value for each of those variables. If it's not, ... then you need to use the minimum value. Code edited.
response value calculated:
1.4418 is the log-odds ( the Intercept in the linear predictor fpr the baseline category)
odds = Pr(X=1)/(1-Pr(X=1)) :: definition
log(odds) = log(Pr(X=1)/(1-Pr(X=1)) ) = 1.4418 :: starting point
Solve for Pr(X=1) ... should be = the calculated "response" value.
exp(1.4418)*(1- Pr(X=1) ) = Pr(X=1)
exp(1.4418) = (1+exp(1.4418))* Pr(X=1)
Pr(X=1) = exp(1.4418) / (1+exp(1.4418))
> exp(1.4418) / (1+exp(1.4418))
[1] 0.8087332
All of the other levels need to have the Intercept added to get their correct linear predictors. The difference in log-odds, i.e. the coefficients, is directly equivalent to the ratio on the odds scale, hence the exp(coef) is a bunch of odds ratios.
{ log(x-y) = c } <=> { x/y = exp(c) }
--
The odds is not the same as the probability. The odds is the number of "successes" (deaths) per "failure" (continue to live), while the probability is the proportion of "successes". I find it instructive to compare how one would estimate these two: An estimate of the odds would be the ratio of the number of successes over the number of failures, while an estimate of the probability would be the ratio of the number of success over the total number of observations.
Odds and probabilities are both ways of quantifying how likely an event is, so it is not surprising that there is a one to one relation between the two. You can turn a probability ($p$) into an odds ($o$) using the following formula: $o=\frac{p}{1-p}$. You can turn an odds into a probability like so: $p = \frac{o}{1+o}$.
So to come back to your example:
- The baseline probability is .5, so you would expect to find 1 failure per success, i.e. the baseline odds is 1. This odds is multiplied by a factor 5.8, so then the odds would become 5.8, which you can transform back to a probability as: $\frac{5.8}{1+5.8}\approx.85$ or 85%
- A two degree change in temperature is association with a change in the odds of death by a factor $5.8^2=33.6$. So the baseline odds is still 1, which means the new odds would be 33.6, i.e. you would expect 33.6 dead fishes for every live fish, or the probability of finding a dead fish is $\frac{33.6}{1+33.6} \approx .97$
- A three degree change in temperatue leads to a new odds of death of $1\times 5.8^3\approx195$. So the probability of finding a dead fish = $\frac{195}{1+195}\approx.99$
Best Answer
That is a huge association. It goes from basically everybody below to everybody above who attend.
Fitting the model:
gives
As noted an OR of 18 ($\approx(exp(3))$).
Typing
summary
for the model gives one way of doing inference:the two sided Wald $p$-value is:
2*pt(44.1, df=3, lower.tail=F) = 0.000025
. As you note, MASS does not calculate $p$-values this way because the "intercept(s)" terms do not have the same mathematical properties as the intercept in a logistic model, so you don't know what their distribution and standard error might be if the null hypothesis was true. Fitting the reduced model and testing the output with a LRT is the way to overcome this.If I fit intercept only:
i <- polr( exam ~ 1, weights=w)
then
anova(f, i)
is:As you note, huge association and huge test statistic had unsurprising result: a large association. The $p$-value of 0 is just a consequence of rounding. $p$-values are never exactly 0. Reporting it out to 3 (or even 2) digits using $p < 0.01$ suffices, especially since significance testing is more concerned with meeting or exceeding alpha level than the actual precision of the $p$-value.
The interpretation of the coefficient is:
The odds of achieving a more desirable exam performance rating for a student that "attended" (clarifying beforehand how attended was defined) was 18 times higher than for a student that did not.
Since the categories are so few, you can also just summarize the predictions:
You can say more than 50% were below average who did not attend, whereas only 6% were below average who did attend. And that only 2% were above expectation who did not attend versus 31% who were above expectation who did attend.
Another implementation of proportional odds that has more "off the shelf" functionality comes from Frank Harrel's
rms
package, specifically thelrm
function. Fitting:Gives the same Wald and LRT statistics that I calculated before.