Solved – Standard Error of prediction for Logistic Sigmoid function

logisticprediction intervalpythonstatsmodels

Standard Error of prediction for Logistic Sigmoid function

(previously: Finding the prediction interval for logistic regression)

Update 2:

This paper describes what I am looking to implement.

Update:

  • Based upon some comments I am renaming this question. I am asking for the std error to calculate prediction bands on a Logistic sigmoid function, which is fit using regression not the regression predictions, which are binary.

  • Predictions are being made using scipy.special.expit like so: prediction = [expit(x*beta + alpha) for x in a_pred]. The results look like this:

Logistic Sigmoid

Based on Kerby Shedden's answer and this I now have to methods of calcualting the SE for the prediction interval:

$ SE = \sqrt(xSx^T) $

and

$ SE = \sqrt(MSE + xSx^T) $

However, is it legitimate to caluate the MSE based on residuals from binary data as shown:

Logistic sigmoid function residuals

TL;DR

there are two type of SE(Standard error) – More Detail

How do I calculate the SE for a predicted indivdual (not the mean) put into a logistic regression model statsmodels.discrete.discrete_model.Logit

Like this for linear regression in statsmodels or this in R.

I can extract model parameter variance, covariance and std dev but that is it….the model does not return std error of predictions – how do I calculate these?

Full

I am trying to fit a prediction interval for logitistic regression model.
I am using statsmodels although I am happy hear answers using another package.

My procedure so far:

Fit the model to data df:

log_mdl = statsmodels.discrete.discrete_model.Logit.from_formula ("hit ~ a",df).fit()

The models parameters returned:

                           Logit Regression Results                           
==============================================================================
Dep. Variable:                    hit   No. Observations:                  200
Model:                          Logit   Df Residuals:                      198
Method:                           MLE   Df Model:                            1
Date:                Tue, 10 Mar 2020   Pseudo R-squ.:                  0.6209
Time:                        14:02:57   Log-Likelihood:                -45.308
converged:                       True   LL-Null:                       -119.52
Covariance Type:            nonrobust   LLR p-value:                 3.823e-34
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -4.0571      0.532     -7.624      0.000      -5.100      -3.014
a              0.0990      0.015      6.657      0.000       0.070       0.128
==============================================================================

then I make some predictions for values a_pred so I can plot the line:

log_pred = [expit(x* log_mdl.params[1] + log_MLE.params[0]) for x in a_pred]

I now wish to find the prediction interval for each prediction.

To do this I need the SE (Standard Error) of each prediction:

However there are two type of SE – More detail:

Ideally there would be a method like this for linear regression in statsmodels or this in R.

I have found this for 95% Confidence interval for of the true logit, which is the same as the martic operation in this. Which is apparently what R returns as the SE for a prediction.

95% Confidence interval for true logit

However I am not sure if this is the SE of the predicted mean (Confidence interval) or the SE of a predicted individual (Prediction interval).

I have also found this in which there are two method of calcualting the CI, one for the funct (i assume mean) and another for an observation (I assume this means a single prediction). This reflects what is said here, that there are two SEs one for the mean and another for a single prediction which includes the variation of the signal not just the accuracy of the estimated mean.

How can I find the SE of a predicted individual?

Best Answer

Something like below should work:

x = log_mdl.model.exog
c = log_mdl.cov_params()
vcov = np.dot(x, np.dot(c, x.T))
se = np.sqrt(np.diag(vcov))

If x is very large there are faster ways to do this that only compute the diagonal of vcov, i.e.

v = (x * np.dot(x, c)).sum(1)
se = np.sqrt(v)

But this is less transparent.

For terminology, I think you could refer to this as the standard error for the logit probabilities.

Related Question