Solved – How is the sigma^2 value (or MSE) for the link function computed in logistic regression in R

generalized linear modellogisticrregression

For example, if you have a logistic regression on certain dataset:

fit <- glm(y ~ x, data = test, family = "binomial")

If you do predict(fit, newdata, type = "link", se = TRUE), you will get a column named se.fit, which is the standard error for each predicted y value.

My questions are:

  1. How is the MSE value for the link function is computed here?

    The variance of the fitting coefficients are basically the MSE times the variance-covariance matrix, there should be a way to compute the MSE value first. But for response variables that have 0 and 1 values, the link function corresponds to 0 and infinity. In this case, how does the model compute this value? Is there any way I can get the MSE value for the glm fitting in R?

  2. Is se.fit the standard error for the link function value of the fitted line at point x0, or the standard error for the predicted link function value of y at point x0?

Best Answer

There are some statistical misunderstandings here.

  1. The mean squared error (MSE) is primarily associated with linear (OLS) models. It isn't really used with logistic regression. For example, calculating the MSE for a model and then multiplying it by the variance-covariance matrix is something that is done in linear regression, but not logistic regression. You should not be trying to get the MSE from a glm model fitted with family=binomial.

    The linear predictor (which I believe is what you mean by "link function" here) is not bound by 0 and infinity, but ranges from -infinity to +infinity.

  2. The se.fit value is on the scale of the linear predictor (i.e., the log odds of Y=1 at X=x0). It would be for both the "fitted line at point x0" and the "predicted link function value of y at point x0", as they are the same thing.

In general, the standard error of a predicted point on the scale of the linear predictor needs to take into account the uncertainty of the estimated slope and intercept, and also how far the x-value of the predicted point is from the mean of x.