Solved – Why are p-values for python’s Statsmodels so different from those reported by Stata

pandaspythonstatastatsmodels

I'm running a OLS regression in Stata and the same one in python's Statsmodels. Note that I adjust for clusters (for id and year).

All the outcomes are very similar if not the same.

But,the Statsmodels p-values are off. For example, the one for X3 has a t-value of 1.951.

Statsmodels assigns a p-value of 0.109, while Stata returns 0.052 (as does Excel for 2-tailed tests and df of 520).

What's going on?

Here is the statsmodels docs, which are kind of unhelpful.

Below is the output using import statsmodels.formula.api as sm and mod = sm.ols(formula=regression_model, data=data):

Statsmodels
==============================================================================
Dep. Variable:                      Y   R-squared:                       0.362
Model:                            OLS   Adj. R-squared:                  0.355
Method:                 Least Squares   F-statistic:                     12.90
Date:                Thu, 27 Feb 2020   Prob (F-statistic):            0.00654
Time:                        17:16:54   Log-Likelihood:                -1631.8
No. Observations:                 580   AIC:                             3278.
Df Residuals:                     573   BIC:                             3308.
Df Model:                           6                                         
Covariance Type:              cluster                                         
================================================================================
                   coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------
Intercept       30.4376      4.711      6.461      0.001      18.328      42.548
X1              -0.3373      0.574     -0.587      0.583      -1.814       1.139
X2              -1.0456      0.173     -6.043      0.002      -1.490      -0.601
X3               0.0293      0.015      1.951      0.109      -0.009       0.068
X4              -0.5298      0.127     -4.186      0.009      -0.855      -0.204
X5             -14.2937      3.132     -4.564      0.006     -22.344      -6.243
X6              -1.0144      0.272     -3.733      0.014      -1.713      -0.316
==============================================================================
Omnibus:                      397.145   Durbin-Watson:                   0.485
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             5643.097
Skew:                           2.865   Prob(JB):                         0.00
Kurtosis:                      17.166   Cond. No.                         218.
==============================================================================

Warnings:
[1] Standard Errors are robust to cluster correlation (cluster)
==============================================================================
Stata:

. cluster2 Y X1 X2 X3 X4 X5 X6, fcluster(id) tcluster(year)

Linear regression with 2D clustered SEs                Number of obs =     580
                                                       F(  6,   573) =   32.41
                                                       Prob > F      =  0.0000
Number of clusters (id) =        99                    R-squared     =  0.3617
Number of clusters (year) =       6                    Root MSE      =  4.0575
------------------------------------------------------------------------------
  man_buffer |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   30.43759   4.711003     6.46   0.000     21.18465    39.69053
         X1  |  -.3373264   .5744737    -0.59   0.557    -1.465658    .7910047
         X2  |  -1.045579   .1730091    -6.04   0.000    -1.385388   -.7057694
         X3  |    .029327    .015031     1.95   0.052    -.0001955    .0588496
         X4  |  -.5297968   .1265761    -4.19   0.000    -.7784065    -.281187
         X5  |  -14.29367   3.131845    -4.56   0.000    -20.44497   -8.142374
         X6  |  -1.014446   .2717238    -3.73   0.000    -1.548142   -.4807496

------------------------------------------------------------------------------

SE clustered by id and year

Best Answer

A default correction is made with the parameter df_correction

If True (default), then the degrees of freedom for the inferential statistics and hypothesis tests, such as pvalues, f_pvalue, conf_int, and t_test and f_test, are based on the number of groups minus one instead of the total number of observations minus the number of explanatory variables.

So for the parameter $X3$ the t-value of 1.951 is not compared for a t-distribution with $df = 573$ but instead with $df = 5$. This is done when you consider a nested model of errors where you get a total error which is the sum of error for clusters plus error of samples within the cluster.

In your case the reduction to $df = 5$ might be very drastic. Or at least it seems to me a bit strange that you have a model with 6 parameters and apparently only 6 (effective) data points (clusters) to determine the parameters of the model (and the noise). I guess that you consider the error from the groups/clusters as not that much influencing or otherwise you would not have performed the test like this.

See more: https://www.google.com/search?q=degrees+of+freedom+mixed+effects