Solved – Difference between an “Ordinary Least Square (OLS) model” and a “Panel Fixed- Effects (FE) model”

difference-in-differenceeconometricsfixed-effects-modelleast squarespanel data

I recently ran into a paper that described the following:

To test the robustness of each specification, we used a
difference-in-difference (DID) estimator to control for time invariant
factors that jointly affected control and treated units. We estimated
the DID with i) an Ordinary Least Square (OLS) model and with ii) a
Panel Fixed-Effects (FE) model.

Source: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0119881#sec002

It is my understanding that an OLS model is (or at least it can be) based on a panel data, where the covariates are treated as fixed effects, right? So what is the difference between the two models tested in the paper?

Best Answer

It is economists' speech for saying that for $y_{it}$ being the observation for individual $i$ at time $t$ and $x_{it}$ being the regressor vector for individual $i$ at time $t$, they ran two models: \begin{align} y_{it} & = \beta_0 + x_{it}^T\beta + \varepsilon_{it}, \end{align} which they call OLS and for which $\beta_0$ is the overall intercept, and \begin{align} y_{it} & = x_{it}^T\beta + \alpha_i + \varepsilon_{it}, \end{align} which they call a Fixed Effects Model and for which $\alpha_i$ is the individual-specific intercept. The terminology is somewhat confusing, since you still estimate your Fixed Effects Model using the OLS estimator, but you will transform the data first using the within-transformation: \begin{align} y_{it} -\bar{y}_i & = (x_{it}^T - \bar{x}_i)\beta + (\alpha_i -\bar{\alpha}_i) + \varepsilon_{it}, \end{align} which you apply to take out the individual-specific term $\alpha_i$. Since you expect it to be fixed across time, $\bar{\alpha}_i = \alpha_i$ and so the term vanishes when you run the actual regression. If you want to estimate the $\alpha_i$ terms, you can do that. Economists call the resulting model the Random effects model. This model has the problem that you need to estimate an additional parameter per individual, and thus your standard errors get larger as you have more individuals, which is why Fixed Effects are preferred if $N$ is large relative to $T$.

The difference is really only the inclusion of an individual-specific intercept term ($= \alpha_i$) to replace the overall intercept ($=\beta_0$). Usually, you try to capture individual-specific heterogeneity with the $\alpha_i$, so it serves as a kind of 'catch all' term. (E.g., if $y_{it}$ is the IQ and $x_{it}$ are some environmental factors, you could interpret $\alpha_i$ as a kind of catch-all term for an individual's genetic predisposition and any other unmeasured individual-specific variables that influence the IQ)

Related Solutions

Difference-in-Differences Fixed Effects vs OLS in Regression

@Charlie is right. You only have two time periods, so there will inevitably be variation in the $i$-specific sample variances of $x_{it}$. In addition, even if you have programmed the simulation for there to be homogenous effects, due to small number of periods there will inevitably be some sample correlation between $x_{it}$ and, e.g., your error term, and so there will inevitably be some "effect heterogeneity" in the $i$-specific partial relationships between $x_{it}$ and $y_{it}$. The interaction of conditional variance and effect heterogeneity tilts your FE estimates of coefficients on $x_{it}$. The coefficient on $x_{it}$ is a precision-weighted average of the $i$-specific coefficients on $x_{it}$. A different tilting occurs when you fit OLS to the model that you have specified above: now, the coefficient on $x_{it}$ is a precision weighted average of the coefficients on $x_{it}$ for the those with $treatment_i=1$ and those with $treatment_i=0$. These differences propagate to your estimates of $\beta_3$. Think Frisch-Waugh-Lovell. To demonstrate the validity of Charlie's claim, simply generate $x_{it}$'s where the variance is exactly constant for each $i$, but you still have different patterns. E.g, randomly assign $i$'s to have either $(x_{i1}, x_{i2})=(0,1)$ or $(1,0)$. If you do this, you will see that the differences between the FE and OLS estimates disappears.

Fixed Effects Dummies vs Estimator – Differences in Panel Data Models

To see equality, let us first derive the FE estimator.

Define the residual-maker matrix \begin{align*} \underset{(M\times M)}{\mathbf{Q}}&:=\mathbf{I}_M-\mathbf{1}_M(\mathbf{1}_M'\mathbf{1}_M)^{-1}\mathbf{1}_M'\\ &=\mathbf{I}_M-\left(% \begin{array}{ccc} 1/M & \cdots & 1/M \\ \vdots & \ddots & \vdots \\ 1/M & \cdots & 1/M \\ \end{array}% \right)\mathbf{1}_M\mathbf{1}_M', \end{align*} where $M$ denotes the number of observations per individual unit in the panel.

Premultiplication with $\mathbf{Q}$ centers the $\mathbf{y}_i$ and $\mathbf{Z}_i$ around their averages over $m$, \begin{align*} \mathbf{Q}\mathbf{y}_i&=\mathbf{y}_i-\mathbf{1}_M\mathbf{1}_M'\mathbf{y}_i/M\\&=\mathbf{y}_i-\mathbf{1}_M\overline{y_{i}}. \end{align*} The also implies that every time invariant variable from the set of regressors $\mathbf{Z}_i$ turns into a column of zeros, and hence is eliminated from the data.

This is a serious disadvantage of the FE estimator. Consider the example of wage regressions for a panel of employees. Variables such as gender or schooling are of primary interest, but (typically) do not change over time (anymore).

As $\mathbf{Q}\mathbf{1}_M=\mathbf{0}$, we have that, using the error-component model $\mathbf{y}_i=\mathbf{Z}_i\mathbf{\delta}+\mathbf{1}_M\alpha_i+\mathbf{\eta}_{i}$, where $\eta_i$ denotes the $M$-vector of idiosyncratic time-varying errors, \begin{align*} \mathbf{Q}\mathbf{y}_i&=\mathbf{Q}\mathbf{F}_i\mathbf{\beta}+\mathbf{Q}\mathbf{\eta}_{i}\qquad i=1,\ldots,n\\ \tilde{\mathbf{y}}_i&\equiv\tilde{\mathbf{F}}_i\mathbf{\beta}+\tilde{\mathbf{\eta}}_{i}, \end{align*} where $\mathbf{F}_i$ is the $(M\times L_b)$-matrix of the observations on the time variant regressors. Stacking the observations over the $n$ units gives $$ \underset{(Mn\times 1)}{\tilde{\mathbf{y}}}:=\left(% \begin{array}{c} \tilde{\mathbf{y}}_1 \\ \vdots \\ \tilde{\mathbf{y}}_n \\ \end{array}% \right)\qquad\underset{(Mn\times L_b)}{\tilde{\mathbf{F}}}:=\left(% \begin{array}{c} \tilde{\mathbf{F}}_1 \\ \vdots \\ \tilde{\mathbf{F}}_n \\ \end{array}% \right) $$

The FE estimator is simply OLS applied to these $Mn$ observations: \begin{align*} \widehat{\mathbf{\beta}}_{\text{FE}}&=(\tilde{\mathbf{F}}'\tilde{\mathbf{F}})^{-1}\tilde{\mathbf{F}}'\tilde{\mathbf{y}} \end{align*}

To see the equality between FE and least squares dummy variables, stack the observations a bit further: \begin{equation} \underset{(Mn\times 1)}{\mathbf{y}}:=\left(% \begin{array}{c} \mathbf{y}_1 \\ \vdots \\ \mathbf{y}_n \\ \end{array}% \right)\;\underset{(Mn\times L_b)}{\mathbf{F}}:=\left(% \begin{array}{c} \mathbf{F}_1 \\ \vdots \\ \mathbf{F}_n \\ \end{array}% \right) \end{equation} and \begin{equation} \underset{(Mn\times 1)}{\mathbf{\eta}}:=\left(% \begin{array}{c} \mathbf{\eta}_1 \\ \vdots \\ \mathbf{\eta}_n \\ \end{array}% \right)\; \underset{(n\times 1)}{\mathbf{\alpha}}:=\left(% \begin{array}{c} \alpha_1 \\ \vdots \\ \alpha_n \\ \end{array}% \right). \end{equation}

Further, let $$ \underset{(Mn\times n)}{\mathbf{D}}:=\mathbf{I}_n\otimes\mathbf{1}_M=\left(% \begin{array}{ccc} \mathbf{1}_M & & \mathbf{O} \\ & \ddots & \\ \mathbf{O}& & \mathbf{1}_M \\ \end{array} \right) $$

Then, the linear panel data model from under an error component assumption in matrix notation is obtained as $$ \mathbf{y}=\mathbf{D}\mathbf{\alpha}+\mathbf{F}\mathbf{\beta}+\mathbf{\eta}, $$ a dummy-variable model.

That is, we can also obtain an estimator of $\mathbf{\beta}$ from an OLS regression on the regressors and $n$ individual specific effects.

Now, note that the Frisch-Waugh-Lovell Theorem says that the OLS estimator of $\mathbf{\beta}$ can be found by regressing $\mathbf{M}_{\mathbf{D}}\mathbf{y}$ on $\mathbf{M}_{\mathbf{D}}\mathbf{F}$, where $$\underset{(Mn\times Mn)}{\mathbf{M}_{\mathbf{D}}}:=\mathbf{I}-\mathbf{D}(\mathbf{D}'\mathbf{D})^{-1}\mathbf{D}'$$ Using symmetry and idempotency of $\mathbf{M}_{\mathbf{D}}$ gives \begin{equation} \widehat{\mathbf{\beta}}_{\text{LSDV}}=(\mathbf{F}'\mathbf{M}_{\mathbf{D}}\mathbf{F})^{-1}\mathbf{F}'\mathbf{M}_{\mathbf{D}}\mathbf{y} \end{equation}

Now, \begin{align*} \mathbf{M}_{\mathbf{D}}&=\mathbf{I}_{Mn}-(\mathbf{I}_n\otimes\mathbf{1}_M)[(\mathbf{I}_n\otimes\mathbf{1}_M)'(\mathbf{I}_n\otimes\mathbf{1}_M)]^{-1}(\mathbf{I}_n\otimes\mathbf{1}_M)'\\ &=\mathbf{I}_{n}\otimes\mathbf{I}_{M}-(\mathbf{I}_n\otimes\mathbf{1}_M)[(\mathbf{I}_n\otimes\mathbf{1}_M')(\mathbf{I}_n\otimes\mathbf{1}_M)]^{-1}(\mathbf{I}_n\otimes\mathbf{1}_M')\\ &=\mathbf{I}_{n}\otimes\mathbf{I}_{M}-(\mathbf{I}_n\otimes\mathbf{1}_M)[\mathbf{I}_n\otimes\mathbf{1}_M'\mathbf{1}_M]^{-1}(\mathbf{I}_n\otimes\mathbf{1}_M')\\ &=\mathbf{I}_{n}\otimes\mathbf{I}_{M}-(\mathbf{I}_n\otimes\mathbf{1}_M)[\mathbf{I}_n\otimes M]^{-1}(\mathbf{I}_n\otimes\mathbf{1}_M')\\ &=\mathbf{I}_{n}\otimes\mathbf{I}_{M}-(\mathbf{I}_n\otimes\mathbf{1}_M)\left[\mathbf{I}_n\otimes \frac{1}{M}\right](\mathbf{I}_n\otimes\mathbf{1}_M')\\ &=\mathbf{I}_{n}\otimes\mathbf{I}_{M}-(\mathbf{I}_n\otimes\mathbf{1}_M)\left[\mathbf{I}_n\otimes \frac{1}{M}\mathbf{1}_M'\right]\\ &=\mathbf{I}_{n}\otimes\mathbf{I}_{M}-\mathbf{I}_n\otimes\mathbf{1}_M\frac{1}{M}\mathbf{1}_M'\\ &=\mathbf{I}_{n}\otimes\left(\mathbf{I}_{M}-\frac{1}{M}\mathbf{1}_M\mathbf{1}_M'\right)\\ &=\mathbf{I}_n\otimes\mathbf{Q} \end{align*}

Thus, \begin{align*} \mathbf{M}_{\mathbf{D}}\mathbf{F}&=(\mathbf{I}_n\otimes\mathbf{Q})\mathbf{F}\\ &=\left(% \begin{array}{ccc} \mathbf{Q} & & \\ & \ddots & \\ & & \mathbf{Q} \\ \end{array} \right)\mathbf{F}\\ &=\tilde{\mathbf{F}}, \end{align*} so that $$\widehat{\mathbf{\beta}}_{\text{LSDV}}=\widehat{\mathbf{\beta}}_{{FE}}.$$

Incidentally, while the notation works with balanced panel data, the result also goes through in the unbalanced case, as one can either check with more complicated notation or this numerical illustration:

library(plm)

# panel dimensions
n <- 10
m <- sample(2:4, n, replace=T) # unbalanced panel

# some data
alpha <- runif(n)
beta <- -2
y <- X <- y.d <- X.d <- c()
D <- matrix(0, sum(m), n) # for the dummy variable matrix
row.counter <- 0
for (i in 1:n) {
  X.n <- runif(m[i],i,i+1)
  X.d <- c(X.d, X.n - mean(X.n))
  X <- c(X,X.n)
  y.n <- alpha[i] + X.n*beta + rnorm(m[i])
  y <- c(y, y.n)
  y.d <- c(y.d, y.n - mean(y.n))
  
  D[(row.counter+1):(row.counter+m[i]), i] <- rep(1, m[i])
  row.counter <- row.counter + m[i]
}

Output:

> # plm
> paneldata <- data.frame(rep(1:n, times=m), unlist(sapply(m, function(i) 1:i)), y, X) # first two columns are for plm to understand the panel .... [TRUNCATED] 

> FE <- plm(y~X, data = paneldata, model = "within")

> # results:
> coef(FE)  # the slope coefficient
        X 
-2.331847 

> fixef(FE) # the intercepts
      1       2       3       4       5       6       7       8       9      10 
0.99396 2.30328 1.90957 2.22670 1.09438 3.10411 2.03265 4.39759 4.42384 4.15294 

> # FWL
> lm(y.d~X.d-1) # just the slope in this formulation

Call:
lm(formula = y.d ~ X.d - 1)

Coefficients:
   X.d  
-2.332  


> # LSDV
> lm(y~D+X-1) # intercepts and slope

Call:
lm(formula = y ~ D + X - 1)

Coefficients:
    D1      D2      D3      D4      D5      D6      D7      D8      D9     D10       X  
 0.994   2.303   1.910   2.227   1.094   3.104   2.033   4.398   4.424   4.153  -2.332

Best Answer

Related Solutions

Difference-in-Differences Fixed Effects vs OLS in Regression

Fixed Effects Dummies vs Estimator – Differences in Panel Data Models

Related Question