Solved – How to calculate pseudo-$R^2$ from R’s logistic regression

likelihoodlogisticpseudo-r-squaredr

Christopher Manning's writeup on logistic regression in R shows a logistic regression in R as follows:

ced.logr <- glm(ced.del ~ cat + follows + factor(class), 
  family=binomial)

Some output:

> summary(ced.logr)
Call:
glm(formula = ced.del ~ cat + follows + factor(class),
    family = binomial("logit"))
Deviance Residuals:
Min            1Q    Median       3Q      Max
-3.24384 -1.34325   0.04954  1.01488  6.40094

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept)   -1.31827    0.12221 -10.787 < 2e-16
catd          -0.16931    0.10032  -1.688 0.091459
catm           0.17858    0.08952   1.995 0.046053
catn           0.66672    0.09651   6.908 4.91e-12
catv          -0.76754    0.21844  -3.514 0.000442
followsP       0.95255    0.07400  12.872 < 2e-16
followsV       0.53408    0.05660   9.436 < 2e-16
factor(class)2 1.27045    0.10320  12.310 < 2e-16
factor(class)3 1.04805    0.10355  10.122 < 2e-16
factor(class)4 1.37425    0.10155  13.532 < 2e-16
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 958.66 on 51 degrees of freedom
Residual deviance: 198.63 on 42 degrees of freedom
AIC: 446.10
Number of Fisher Scoring iterations: 4

He then goes into some detail about how to interpret coefficients, compare different models, and so on. Quite useful.

However, how much variance does the model account for? A Stata page on logistic regression says:

Technically, $R^2$ cannot be computed the same way in logistic regression as it is in OLS regression. The pseudo-$R^2$, in logistic regression, is defined as $1 – \frac{L1}{L0}$, where $L0$ represents the log likelihood for the "constant-only" model and $L1$ is the log likelihood for the full model with constant and predictors.

I understand this at the high level. The constant-only model would be without any of the parameters (only the intercept term). Log likelihood is a measure of how closely the parameters fit the data. In fact, Manning sort of hints that the deviance might be $-2 \log L$. Perhaps null deviance is constant-only and residual deviance is $-2 \log L$ of the model? However, I'm not crystal clear on it.

Can someone verify how one actually computes the pseudo-$R^2$ in R using this example?

Best Answer

Don't forget the rms package, by Frank Harrell. You'll find everything you need for fitting and validating GLMs.

Here is a toy example (with only one predictor):

set.seed(101)
n <- 200
x <- rnorm(n)
a <- 1
b <- -2
p <- exp(a+b*x)/(1+exp(a+b*x))
y <- factor(ifelse(runif(n)<p, 1, 0), levels=0:1)
mod1 <- glm(y ~ x, family=binomial)
summary(mod1)

This yields:

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.8959     0.1969    4.55 5.36e-06 ***
x            -1.8720     0.2807   -6.67 2.56e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 258.98  on 199  degrees of freedom
Residual deviance: 181.02  on 198  degrees of freedom
AIC: 185.02

Now, using the lrm function,

require(rms)
mod1b <- lrm(y ~ x)

You soon get a lot of model fit indices, including Nagelkerke $R^2$, with print(mod1b):

Logistic Regression Model

lrm(formula = y ~ x)

                      Model Likelihood     Discrimination    Rank Discrim.    
                         Ratio Test            Indexes          Indexes       

Obs           200    LR chi2      77.96    R2       0.445    C       0.852    
 0             70    d.f.             1    g        2.054    Dxy     0.705    
 1            130    Pr(> chi2) <0.0001    gr       7.801    gamma   0.705    
max |deriv| 2e-08                          gp       0.319    tau-a   0.322    
                                           Brier    0.150                     


          Coef    S.E.   Wald Z Pr(>|Z|)
Intercept  0.8959 0.1969  4.55  <0.0001 
x         -1.8720 0.2807 -6.67  <0.0001 

Here, $R^2=0.445$ and it is computed as $\left(1-\exp(-\text{LR}/n)\right)/\left(1-\exp(-(-2L_0)/n)\right)$, where LR is the $\chi^2$ stat (comparing the two nested models you described), whereas the denominator is just the max value for $R^2$. For a perfect model, we would expect $\text{LR}=2L_0$, that is $R^2=1$.

By hand,

> mod0 <- update(mod1, .~.-x)
> lr.stat <- lrtest(mod0, mod1)
> (1-exp(-as.numeric(lr.stat$stats[1])/n))/(1-exp(2*as.numeric(logLik(mod0)/n)))
[1] 0.4445742
> mod1b$stats["R2"]
       R2 
0.4445742 

Ewout W. Steyerberg discussed the use of $R^2$ with GLM, in his book Clinical Prediction Models (Springer, 2009, § 4.2.2 pp. 58-60). Basically, the relationship between the LR statistic and Nagelkerke's $R^2$ is approximately linear (it will be more linear with low incidence). Now, as discussed on the earlier thread I linked to in my comment, you can use other measures like the $c$ statistic which is equivalent to the AUC statistic (there's also a nice illustration in the above reference, see Figure 4.6).