The two estimators are computed differently, but are numerically identical, so essentially it doesn't matter. The within estimator is computationally easier since it keeps the size of the design matrix in check, and I would think that is how the within estimator is implemented. Here is some R code to demonstrate this
library(plm)
data("Produc", package = "plm")
plmResults <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc,
index = c("state","year"))
summary(plmResults)
regResults <- lm(log(gsp) ~ as.factor(state) + log(pcap) + log(pc) + log(emp) + unemp,
data = Produc)
summary(regResults)
Or, if you prefer, some Stata code,
webuse nlswork
xtset idcode
xtreg ln_w grade c.age##c.age c.ttl_exp##c.ttl_exp c.tenure##c.tenure ///
2.race not_smsa south, fe
areg ln_w grade c.age##c.age c.ttl_exp##c.ttl_exp c.tenure##c.tenure ///
2.race not_smsa south, absorb(idcode)
A proof using the Frisch-Waugh-Lovell theorem can easily be given. Note one crucial point that for a large number of groups, that is, $n\to \infty$, the estimates of the coefficients on the group dummies are not consistent.
The unobserved effects model is modeled as:
\begin{equation}
y = X\beta + u
\end{equation}
where
\begin{equation}
u = c_{i} + \lambda_{t} + v_{it}
\end{equation}
A one-way error model assumes $\lambda_{t} = 0$ while a two-way error allows for $\lambda \in \mathbb{R}$ and that is the answer to the first question.
The second question cannot be answered without more assumptions about the error structure or purpose of the study. Using Wooldridge (2010) chapters 10 and 11, generalize each of the assumptions to cover the temporal error structure as well. For example, when considering POLS, the critical assumption is $\mathop{\mathbb{E}}\left(\mathbf{x}_{it}^{\prime}u\right) = 0$. In the chapter it is summarized as meeting the following conditions:
- $\mathop{\mathbb{E}}\left(\mathbf{x}_{it}^{\prime}c\right) = 0$
- $\mathop{\mathbb{E}}\left(\mathbf{x}_{it}^{\prime}v\right) = 0$
However, if one does not assume $\lambda_{t} = 0$, i.e., two-way error model, a third condition must be satisfied for consistency of the POLS estimator:
\begin{equation}
\mathop{\mathbb{E}}\left(\mathbf{x}_{it}^{\prime}\lambda\right) = 0
\end{equation}
and so on.
In the case of estimating the fixed effects, one can go with LSDV (including indicators for the panel ID and temporal ID), but the dimension might become unfeasible fast. One alternative is to use the one-way error within estimator and include the time dummies such as one usually do with software that does not allow for two-way error models like Stata. A third and most efficient way is to estimate it with the two-way error within estimator.
\begin{equation}
y_{it} − \bar{y}_{i.} − \bar{y}_{.t} + \bar{y}_{..} = (x_{it} − \bar{x}_{i.} − \bar{x}_{.t} + \bar{x}_{..})\beta
\end{equation}
This approach is coded in several statistical packages such as the R package plm and correctly adjust the degrees of freedom to include the T - 1 additional parameters compared to the one-way error within estimator.
Most two-error way model estimators are not limited to balanced panels (only a handful). For short-panels running the one-way error within estimator with time dummies is feasible. As a side note, even if one gets the estimates for the temporal effects it is important to notice that as with the LSDV fixed effects for one-way error models these are not consistent as the estimates increase in number and length of panels.
I recommend Baltagi (2013) textbook for a pretty comprehensive explanation of the estimators for one-way and two-way error models.
References:
Baltagi, Badi H. 2013. Econometric analysis of panel data. Fifth Edition. Chichester, West Sussex: John Wiley & Sons, Inc. isbn: 978-1-118-67232-7.
Croissant, Yves, and Giovanni Millo. 2008. “Panel Data Econometrics in R : The plm Package.” Journal of Statistical Software 27 (2). doi:10.18637/jss.v027.i02.
StataCorp. 2017. Stata 15 Base Reference Manual. College Station, TX: Stata Press.
Wooldridge, Jeffrey M. 2010. Econometric Analysis of Cross Section and Panel Data. Kindle Edition. The MIT Press. ISBN: 978-0-262-23258-8.
Best Answer
The two are equivalent.
The second version uses the Frisch-Waugh-Lovell theorem which says that you can compute a subset of regression coefficients of a regression (here, $\hat\beta$) by (1) regressing $y$ on the other regressors (here, $D$), saving the residuals (here, the time-demeaned $y$ or $M_{[D]}y$, because regression on a constant just demeans the variables), then (2) regressing the $X$ on $D$ and saving the residuals $M_{[D]}X$, and (3) regress the residuals onto each other, $M_{[D]}y$ on $M_{[D]}X$.
The second version is indeed much more widely used, because typical panel data sets may have thousands of panel units, so that the first approach would require you to run a regression with thousands of regressors, which is not a good idea numerically even nowadays with fast computers.