Solved – How to explain the logistic regression model objects from the rms::lrm() function in R

logisticr

Recently I was trying to do logistic regression using the rms::lrm() function. But I had some trouble understanding the model objects from the function. Here is the example from the package:

#dataset
n            <- 1000    # define sample size
set.seed(17)            # so can reproduce the results
treat        <- factor(sample(c('a','b','c'), n,TRUE))
num.diseases <- sample(0:4, n,TRUE)
age          <- rnorm(n, 50, 10)
cholesterol  <- rnorm(n, 200, 25)
weight       <- rnorm(n, 150, 20)
sex          <- factor(sample(c('female','male'), n,TRUE))
L            <- .1*(num.diseases-2) + .045*(age-50) +
                (log(cholesterol - 10)-5.2)*(-2*(treat=='a') +
                3.5*(treat=='b')+2*(treat=='c'))
y            <- ifelse(runif(n) < plogis(L), 1, 0)
#fit model
g            <- lrm(y ~ treat*rcs(age))

> g

Logistic Regression Model

lrm(formula = y ~ treat * rcs(age))

                      Model Likelihood     Discrimination    Rank Discrim.    
                         Ratio Test            Indexes          Indexes       
Obs          1000    LR chi2      76.77    R2       0.099    C       0.656    
 0            478    d.f.            14    g        0.665    Dxy     0.312    
 1            522    Pr(> chi2) <0.0001    gr       1.945    gamma   0.314    
max |deriv| 3e-06                          gp       0.156    tau-a   0.156    
                                           Brier    0.231    
> anova(g)
                Wald Statistics          Response: y 

 Factor                                     Chi-Square d.f. P     
 treat  (Factor+Higher Order Factors)        5.62      10   0.8462
  All Interactions                           1.30       8   0.9956
 age  (Factor+Higher Order Factors)         65.99      12   <.0001
  All Interactions                           1.30       8   0.9956
  Nonlinear (Factor+Higher Order Factors)    2.23       9   0.9872
 treat * age  (Factor+Higher Order Factors)  1.30       8   0.9956
  Nonlinear                                  0.99       6   0.9858
  Nonlinear Interaction : f(A,B) vs. AB      0.99       6   0.9858
 TOTAL NONLINEAR                             2.23       9   0.9872
 TOTAL NONLINEAR + INTERACTION               2.57      11   0.9953
 TOTAL                                      69.06      14   <.0001

Here are my questions:
For the object g,

What does the max |deriv| 3e-06 mean?
What do the Discrimination and Rand Discrim. Indexes suggest?

For the anova(g) object,

What's the Factor +Higher Order Factors for the treat?
Why there are two all interactions? How to explain the nonlinear parts?

Best Answer

Max |deriv| is the maximum (over $\beta$s) of the absolute value of the first derivative of the log-likelihood function at the apparent maximum likelihood estimates. Being close to zero is a good indication that convergence happened. There are a lot of background sources explaining the various indexes, especially my course notes at http://biostat.mc.vanderbilt.edu/CourseBios330. To interpret anova, type ?print.anova.rms to see the various printing options for anova objects. These options include printing subscripts of $\beta$s that are tested in a given row, or printing a matrix of asterisks to the right of the statistics with blank spaces for $\beta$s not included in the current test.

Related Solutions

Solved – How to understand output from R’s polr function (ordered logistic regression)

I would suggest you look at books on categorical data analysis (cf. Alan Agresti's Categorical Data Analysis, 2002) for better explanation and understanding of ordered logistic regression. All the questions that you ask are basically answered by a few chapters in such books. If you are only interested in R related examples, Extending Linear Models in R by Julian Faraway (CRC Press, 2008) is a great reference.

Before I answer your questions, ordered logistic regression is a case of multinomial logit models in which the categories are ordered. Suppose we have $J$ ordered categories and that for individual $i$, with ordinal response $Y_i$, $p_{ij}=P(Yi=j)$ for $j=1,..., J$. With an ordered response, it is often easier to work with the cumulative probabilities, $\gamma_{ij}=P(Y_i \le j)$. The cumulative probabilities are increasing and invariant to combining adjacent categories. Furthermore, $\gamma_{iJ}=1$, so we need only model $J–1$ probabilities.

Now we want to link $\gamma_{ij}$s to covariates $x$. In your case, Sat has 3 ordered levels: low, medium, high. It makes more sense to treat them as ordered rather than unordered. The remaining variables are your covariates. The specific model that you are considering is the proportional odds model and is mathematically equivalent to:

$$\mbox{logit } \gamma_j(x_i) = \theta_j - \beta^T x_i, j = 1 \ldots J-1$$ $$\mbox{where }\gamma_j(x_i)=P(Y_i \le j | x_i)$$

It is so called because the relative odds for $Y \le j$ comparing $x_1$ and $x_2$ are:

$$\left(\frac {\gamma_j(x_1)}{1-\gamma_j(x_1)}\right) / \left(\frac {\gamma_j(x_2)}{1-\gamma_j(x_2)}\right)=\exp(-\beta^T (x_1-x_2))$$

Notice, the above expression does not depend on $j$. Of course, the assumption of proportional odds does need to be checked for a given dataset.

Now, I will answer some (1, 2, 4) questions.

How can one understand if the model gave a good fit? summary(house.plr) shows Residual Deviance 3479.149 and AIC (Akaike Information Criterion?) of 3495.149. Is that good? In the case those are only useful as relative measures (i.e. to compare to another model fit), what is a good absolute measure? Is the residual deviance approximately chi-squared distributed? Can one use "% correctly predicted" on the original data or some cross-validation? What is the easiest way to do that?

A model fit by polr is a special glm, so all the assumptions that hold for a traditional glm hold here. If you take care of the parameters properly, you can figure out the distribution. Specifically, to test the if the model is good or not you may want to do a goodness of fit test, which test the following null (notice this is subtle, mostly you want to reject the null, but here you don't want to reject it to get a good fit):

$$H_o: \mbox{ current model is good enough }$$

You would use the chi-square test for this. The p-value is obtained as:

1-pchisq(deviance(house.plr),df.residual(house.plr))

Most of the time you'd hope to obtain a p-value greater than 0.05 so that you don't reject the null to conclude that the model is good fit (philosophical correctness is ignored here).

AIC should be high for a good fit at the same time you don't want to have a large number of parameters. stepAIC is a good way to check this.

Yes, you can definitely use cross validation to see if the predictions hold. See predict function (option: type = "probs") in ?polr. All you need to take care of is the covariates.

What information does pr contain? The help page on profile is generic, and gives no guidance for polr

As pointed by @chl and others, pr contains all the information needed for obtaining CIs and other likelihood related information of the polr fit. All glms are fit using iteratively weighted least square estimation method for the log likelihood. In this optimization you obtain a lot of information (please see the references) which will be needed for calculating Variance Covariance Matrix, CI, t-value etc. It includes all of it.

How does one interpret the t values for each coefficient? Unlike some model >fits, there are no P values here.

Unlike normal linear model (special glm) other glms are don't have the nice t-distribution for the regression coefficients. Therefore all you can get is the parameter estimates and their asymptotic variance covariance matrix using the max-likelihood theory. Therefore:

$$\text{Variance}(\hat \beta) = (X^T W X)^{-1}\hat \phi$$

Estimate divided by its standard error is what BDR and WV call t-value (I am assuming MASS convention here). It is equivalent to t-value from normal linear regression but does not follow a t-distribution. Using CLT, it is asymptotically normally distributed. But they prefer not to use this approx (I guess), hence no p-values. (I hope I am not wrong, and if I am, I hope BDR is not on this forum. I further hope, someone will correct me if I am wrong.)

Solved – How to explain smoothing functions in the logistic regression model

The reference age is the age that your function cuts the x-axis (around 67, maybe, by eyeballing your graph). Whatever, let's say that it is 67. The odds ratio is the odds of the event (probability of the event divided by one minus the probability of the event), given the person's age divided by the odds of the event given the age=67: \begin{equation} \frac{\frac{P\{E|age\}}{1-P\{E|age\}}}{\frac{P\{E|67\}}{1-P\{E|67\}}}= \frac{exp(f(age))}{exp(f(67))}=exp(f(age)) \end{equation} So, the odds ratio for an 18-year-old relative to a 67-year-old would be $exp(0.8)=2.22$, or the 18-year-old has odds of the event 222% as high as the 67-year-old.

If you don't want the reference age to be 67, then you can make it anything you like via subtraction. If you want the reference age to be 18, then just subtract 0.8 from the value of f(age) for every value of age. Then the odds ratio for a 27-year-old, compared to (now) an 18-year-old is $exp(0.39-0.80)=0.66$, or the 27-year-old has odds 66% as high as the 18-year-old to have the event.

Best Answer

Related Solutions

Solved – How to understand output from R’s polr function (ordered logistic regression)

Solved – How to explain smoothing functions in the logistic regression model

Related Question