Fixed Effects Model – Comparing Python’s Linearmodels PanelOLS with Stata’s xtreg, fe

fixed-effects-modelpythonstata

I'd like to perform a fixed effects panel regression with two IVs (x1 and x2) and one DV (y), using robust standard errors. In Python I used the following command:

result = PanelOLS(data.y, sm2.add_constant(data[['x1', 'x2']]), entity_effects=True).fit(cov_type='robust')        
result

resulting in:

                          PanelOLS Estimation Summary                           
================================================================================
Dep. Variable:                      y   R-squared:                        0.0008
Estimator:                   PanelOLS   R-squared (Between):             -0.0212
No. Observations:               34338   R-squared (Within):               0.0008
Date:                Tue, May 05 2020   R-squared (Overall):           3.076e-05
Time:                        11:29:40   Log-likelihood                -4.647e+05
Cov. Estimator:                Robust                                           
                                        F-statistic:                      13.569
Entities:                        1304   P-value                           0.0000
Avg Obs:                       26.333   Distribution:                 F(2,33805)
Min Obs:                       0.0000                                           
Max Obs:                       75.000   F-statistic (robust):             71.477
                                        P-value                           0.0000
Time periods:                      88   Distribution:                 F(2,33805)
Avg Obs:                       390.20                                           
Min Obs:                       0.0000                                           
Max Obs:                       499.00                                           

                             Parameter Estimates                              
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
const       5.472e+05     6478.7     84.469     0.0000   5.346e+05   5.599e+05
x1            -758.82     70.912    -10.701     0.0000     -897.81     -619.83
x2            -322.77     60.629    -5.3238     0.0000     -441.61     -203.94
==============================================================================

F-test for Poolability: 1337.3
P-value: 0.0000
Distribution: F(530,33805)

Included effects: Entity

Because the results seamed a bit off I tried to replicate them with Stata, using:

xtreg y x1 x2, fe vce(robust)

resulting in:

Fixed-effects (within) regression               Number of obs      =     34338
Group variable: ID                              Number of groups   =       531

R-sq:  within  = 0.0008                         Obs per group: min =         1
       between = 0.0010                                        avg =      64.7
       overall = 0.0004                                        max =        75

                                                F(2,530)           =      7.99
corr(u_i, Xb)  = -0.0205                        Prob > F           =    0.0004

                                   (Std. Err. adjusted for 531 clusters in ID)
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |  -758.8212   202.0153    -3.76   0.000     -1155.67   -361.9723
          x2 |  -322.7749   219.6023    -1.47   0.142    -754.1727    108.6229
       _cons |   547249.1    22976.5    23.82   0.000     502112.9    592385.3
-------------+----------------------------------------------------------------
     sigma_u |  1266542.2
     sigma_e |  183793.32
         rho |  .97937616   (fraction of variance due to u_i)
------------------------------------------------------------------------------

The results are different. Especially the difference in p-value for x2, the average and min observations seem to be off. I do not understand what I am doing wrong in the Python version. Is the command I’m using correct? Have I missed a fundamental difference within the two models?


EDIT: as @Jesper for President pointed out there are some differences in the way Stata and Python interpret the data. Here is what I found out so far: My time variable is dates. As some dates are missing, Python seems to fill up the missing ones (Stata Obs per group max: 75 vs. Python Time Periods: 88). Further, Stata's vce(robust) does not seem to do the same like Pythons cov_type='robust'. By reading the manuals, I understand that both are including White-Sandwich estimator of variance. Nevertheless, while the results without robust standard errors are almost identical (difference in observations is the same like for robust SE), including them leads to the difference in p-values presented here. Can anybody help me to further understand the problem?

Best Answer

If anybody has a similar question, I found my mistake. According to the linearmodels manual the robust SE used (White's robust covarinace) is not robust for fixed effects. Instead clustered covariance is required. Stata seems to adapt that automatically. Therefore the correct command is:

from linearmodels import PanelOLS
fe_mod = PanelOLS(df.y,sm2.add_constant(df[['x1' , 'x2']]), entity_effects=True)
fe_res = fe_mod.fit(cov_type='clustered', cluster_entity=True)
print(fe_res)

This leads to:

                              PanelOLS Estimation Summary                           
================================================================================
Dep. Variable:                      y   R-squared:                        0.0008
Estimator:                   PanelOLS   R-squared (Between):             -0.0212
No. Observations:               34338   R-squared (Within):               0.0008
Date:                Tue, May 05 2020   R-squared (Overall):           3.076e-05
Time:                        18:04:12   Log-likelihood                -4.647e+05
Cov. Estimator:             Clustered                                           
                                        F-statistic:                      13.569
Entities:                        1304   P-value                           0.0000
Avg Obs:                       26.333   Distribution:                 F(2,33805)
Min Obs:                       0.0000                                           
Max Obs:                       75.000   F-statistic (robust):             8.0007
                                        P-value                           0.0003
Time periods:                      88   Distribution:                 F(2,33805)
Avg Obs:                       390.20                                           
Min Obs:                       0.0000                                           
Max Obs:                       499.00                                           
                                                                                
                             Parameter Estimates                              
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
const       5.472e+05  2.296e+04     23.840     0.0000   5.023e+05   5.922e+05
x1            -758.82     201.83    -3.7597     0.0002     -1154.4     -363.23
x2            -322.77     219.40    -1.4712     0.1413     -752.80      107.25
==============================================================================

F-test for Poolability: 1337.3
P-value: 0.0000
Distribution: F(530,33805)

Included effects: Entity

Which is close enough to the Stata version for me to trust the results.