Solved – Explained variance in logistic regression based on regression coefficients

logisticnonlinear regressionr-squaredregressionvariance

I am wondering about the relation between the explained variance and the regression coefficients in logistic regression. So given a multiple linear regression

$y_i = \beta_0 + \sum_{k=1}^{K} \beta_k x_{ki} + e_i$,

I know that $R^2$ can be simplified to

$ R^2 = \sum_{k=1}^{K} \beta^2_k + 2 \sum_{k<k'}^{} \beta_k \beta_{k'} \rho_{kk'}$.

So given that we now the values of the regression coefficients and their interrelation $\rho_{kk'}$ we can determine $R^2$ for the linear case. Is there an equivalent in logistic regression? How would it work?

Best Answer

The best basis for explained variation in binary Y is the variance of predicted probabilities. This and related measures are discussed here which references important articles by Kent & O'Quigley and Choodari-Oskooei et al.

It doesn't help very much to expand the formula as you did, but the analogy to ordinary linear models is very helpful. Think of partitioning sum of squares total into sum of squares regression and sum of squared errors: SST = SSR + SSE. $R^2$ is essentially var(predictions) / var(raw Y). var(predictions) is easy, and for non-ordinary linear models we have to work on var(raw Y) as Kent & O'Quigley did. The blog article goes more into this.

Related Question