I'd like to perform a fixed effects panel regression with two IVs (x1 and x2) and one DV (y), using robust standard errors. In Python I used the following command:
result = PanelOLS(data.y, sm2.add_constant(data[['x1', 'x2']]), entity_effects=True).fit(cov_type='robust')
result
resulting in:
PanelOLS Estimation Summary
================================================================================
Dep. Variable: y R-squared: 0.0008
Estimator: PanelOLS R-squared (Between): -0.0212
No. Observations: 34338 R-squared (Within): 0.0008
Date: Tue, May 05 2020 R-squared (Overall): 3.076e-05
Time: 11:29:40 Log-likelihood -4.647e+05
Cov. Estimator: Robust
F-statistic: 13.569
Entities: 1304 P-value 0.0000
Avg Obs: 26.333 Distribution: F(2,33805)
Min Obs: 0.0000
Max Obs: 75.000 F-statistic (robust): 71.477
P-value 0.0000
Time periods: 88 Distribution: F(2,33805)
Avg Obs: 390.20
Min Obs: 0.0000
Max Obs: 499.00
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
const 5.472e+05 6478.7 84.469 0.0000 5.346e+05 5.599e+05
x1 -758.82 70.912 -10.701 0.0000 -897.81 -619.83
x2 -322.77 60.629 -5.3238 0.0000 -441.61 -203.94
==============================================================================
F-test for Poolability: 1337.3
P-value: 0.0000
Distribution: F(530,33805)
Included effects: Entity
Because the results seamed a bit off I tried to replicate them with Stata, using:
xtreg y x1 x2, fe vce(robust)
resulting in:
Fixed-effects (within) regression Number of obs = 34338
Group variable: ID Number of groups = 531
R-sq: within = 0.0008 Obs per group: min = 1
between = 0.0010 avg = 64.7
overall = 0.0004 max = 75
F(2,530) = 7.99
corr(u_i, Xb) = -0.0205 Prob > F = 0.0004
(Std. Err. adjusted for 531 clusters in ID)
------------------------------------------------------------------------------
| Robust
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | -758.8212 202.0153 -3.76 0.000 -1155.67 -361.9723
x2 | -322.7749 219.6023 -1.47 0.142 -754.1727 108.6229
_cons | 547249.1 22976.5 23.82 0.000 502112.9 592385.3
-------------+----------------------------------------------------------------
sigma_u | 1266542.2
sigma_e | 183793.32
rho | .97937616 (fraction of variance due to u_i)
------------------------------------------------------------------------------
The results are different. Especially the difference in p-value for x2, the average and min observations seem to be off. I do not understand what I am doing wrong in the Python version. Is the command I’m using correct? Have I missed a fundamental difference within the two models?
EDIT: as @Jesper for President pointed out there are some differences in the way Stata and Python interpret the data. Here is what I found out so far: My time variable is dates. As some dates are missing, Python seems to fill up the missing ones (Stata Obs per group max: 75 vs. Python Time Periods: 88). Further, Stata's vce(robust) does not seem to do the same like Pythons cov_type='robust'. By reading the manuals, I understand that both are including White-Sandwich estimator of variance. Nevertheless, while the results without robust standard errors are almost identical (difference in observations is the same like for robust SE), including them leads to the difference in p-values presented here. Can anybody help me to further understand the problem?
Best Answer
If anybody has a similar question, I found my mistake. According to the linearmodels manual the robust SE used (White's robust covarinace) is not robust for fixed effects. Instead clustered covariance is required. Stata seems to adapt that automatically. Therefore the correct command is:
This leads to:
Which is close enough to the Stata version for me to trust the results.