Solved – Predicted probabilities from probit

probitstata

Assume following probit model:

$y_i$ = $\phi$($\beta_0$+$\beta_1x_1$+$\beta_2x_1^2$+$\beta_3d_1$+$\beta_4d_2$)
where $d_1$ and $d_2$ are dummies

or in Stata:

probit y_i x1 xsq d1 d2

Now I want to predict the probabilities $P(\hat{y_i} = 1)$ for each observation x. This seems very simple but I keep failing to program it in Stata.

I tried:

predict pr, xb

But this gives me values greater than 1.

Any ideas?

Best Answer

This is on the face of it a Stata question, but there is a statistical confusion at its core. Here is wrong and right syntax for what you want exemplified.

. sysuse auto, clear
(1978 Automobile Data)

. probit foreign mpg weight

Iteration 0:   log likelihood =  -45.03321
Iteration 1:   log likelihood = -29.244141
Iteration 2:   log likelihood = -27.041557
Iteration 3:   log likelihood =  -26.84658
Iteration 4:   log likelihood = -26.844189
Iteration 5:   log likelihood = -26.844189

Probit regression                                 Number of obs   =         74
                                                  LR chi2(2)      =      36.38
                                                  Prob > chi2     =     0.0000
Log likelihood = -26.844189                       Pseudo R2       =     0.4039

------------------------------------------------------------------------------
     foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -.1039503   .0515689    -2.02   0.044    -.2050235   -.0028772
      weight |  -.0023355   .0005661    -4.13   0.000     -.003445   -.0012261
       _cons |   8.275464   2.554142     3.24   0.001     3.269438    13.28149
------------------------------------------------------------------------------

. predict pr
(option pr assumed; Pr(foreign))

. predict xb, xb

. su pr xb

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          pr |        74     .294487    .3074146   9.52e-06   .9029781
          xb |        74   -.9904795    1.376307  -4.275976   1.298709

As the help explains (just read help probit postestimation), the default for predict after probit is to give predicted probabilities, and that is what you want. By insisting on xb, you got the linear predictor. You can get what you want by pushing your predictions through the cumulative standard normal (in Stata normal()) but just using the default gets you there directly. In essence you want a back-transformation to the probability scale, but that is so common a need that Stata (and presumably all good statistical software) provides it directly.

Related Question