Panel Data – Difference of $R^2$ Between OLS with Individual Dummies and Panel Fixed Effect Model

fixed-effects-modelleast squarespanel datar-squared

Based on my understanding, panel fixed effect model is equivalent to OLS with individual dummies. However, when I ran the two models in R, the $R^2$ from the two models were quite different: 0.8 for OLS with individual dummies and 0.06 for fixed effect model.

Is it the case that in fixed effect model, the fixed effects (individual dummies) are excluded from the calculation of $R^2$?

Best Answer

In essence, yes. The $R^2$ value given for fixed effects regressions is often called the "within $R^2$". If you use stata, the output will give overall, within, and between $R^2$. If you use the plm package in R, it just give the within $R^2$. The basic difference between the overall and within $R^2$ is that the within finds the total sum of squares on the demeaned outcome variable. Fixed effects regression demeans the y for each fixed entity.

For the fixed effects model, $$R^2 = \dfrac{SSR}{TSS_{demeaned \ y}} = \dfrac{\sum(y - \hat y)^2}{\sum([y_i - \bar y_i] - \overline {[y_i - \bar y_i]})^2}$$

To demonstrate in R using the EmplUK data from plm:

    > library(plm)
    > data("EmplUK")
    > fixed <- plm(emp ~ wage + capital, data = EmplUK, index= 
         c("firm"), model = "within")
    > fixed.dum <- lm(emp ~ wage + capital + factor(firm) - 1, 
          data = EmplUK)
    > summary(fixed.dum)$r.squared[1] 
  summary(fixed)$r.squared[1] 
    [1] 0.9870826
          rsq 
    0.1635585 
    > 
    
    > #"Within" R2
    > SSR <- sum(fixed$residuals^2)
> demeaned_y <- EmplUK$emp - 
       tapply(EmplUK$emp, EmplUK$firm,mean)[EmplUK$firm]
> TSS_demeaned_y <- sum((demeaned_y-mean(demeaned_y))^2)
> within_R2 <- 1-(SSR/TSS_demeaned_y)
> c(summary(fixed)$r.squared[1], "rsq" = within_R2)
          rsq       rsq 
    0.1635585 0.1635585

Related Solutions

Fixed Effects Dummies vs Estimator – Differences in Panel Data Models

To see equality, let us first derive the FE estimator.

Define the residual-maker matrix \begin{align*} \underset{(M\times M)}{\mathbf{Q}}&:=\mathbf{I}_M-\mathbf{1}_M(\mathbf{1}_M'\mathbf{1}_M)^{-1}\mathbf{1}_M'\\ &=\mathbf{I}_M-\left(% \begin{array}{ccc} 1/M & \cdots & 1/M \\ \vdots & \ddots & \vdots \\ 1/M & \cdots & 1/M \\ \end{array}% \right)\mathbf{1}_M\mathbf{1}_M', \end{align*} where $M$ denotes the number of observations per individual unit in the panel.

Premultiplication with $\mathbf{Q}$ centers the $\mathbf{y}_i$ and $\mathbf{Z}_i$ around their averages over $m$, \begin{align*} \mathbf{Q}\mathbf{y}_i&=\mathbf{y}_i-\mathbf{1}_M\mathbf{1}_M'\mathbf{y}_i/M\\&=\mathbf{y}_i-\mathbf{1}_M\overline{y_{i}}. \end{align*} The also implies that every time invariant variable from the set of regressors $\mathbf{Z}_i$ turns into a column of zeros, and hence is eliminated from the data.

This is a serious disadvantage of the FE estimator. Consider the example of wage regressions for a panel of employees. Variables such as gender or schooling are of primary interest, but (typically) do not change over time (anymore).

As $\mathbf{Q}\mathbf{1}_M=\mathbf{0}$, we have that, using the error-component model $\mathbf{y}_i=\mathbf{Z}_i\mathbf{\delta}+\mathbf{1}_M\alpha_i+\mathbf{\eta}_{i}$, where $\eta_i$ denotes the $M$-vector of idiosyncratic time-varying errors, \begin{align*} \mathbf{Q}\mathbf{y}_i&=\mathbf{Q}\mathbf{F}_i\mathbf{\beta}+\mathbf{Q}\mathbf{\eta}_{i}\qquad i=1,\ldots,n\\ \tilde{\mathbf{y}}_i&\equiv\tilde{\mathbf{F}}_i\mathbf{\beta}+\tilde{\mathbf{\eta}}_{i}, \end{align*} where $\mathbf{F}_i$ is the $(M\times L_b)$-matrix of the observations on the time variant regressors. Stacking the observations over the $n$ units gives $$ \underset{(Mn\times 1)}{\tilde{\mathbf{y}}}:=\left(% \begin{array}{c} \tilde{\mathbf{y}}_1 \\ \vdots \\ \tilde{\mathbf{y}}_n \\ \end{array}% \right)\qquad\underset{(Mn\times L_b)}{\tilde{\mathbf{F}}}:=\left(% \begin{array}{c} \tilde{\mathbf{F}}_1 \\ \vdots \\ \tilde{\mathbf{F}}_n \\ \end{array}% \right) $$

The FE estimator is simply OLS applied to these $Mn$ observations: \begin{align*} \widehat{\mathbf{\beta}}_{\text{FE}}&=(\tilde{\mathbf{F}}'\tilde{\mathbf{F}})^{-1}\tilde{\mathbf{F}}'\tilde{\mathbf{y}} \end{align*}

To see the equality between FE and least squares dummy variables, stack the observations a bit further: \begin{equation} \underset{(Mn\times 1)}{\mathbf{y}}:=\left(% \begin{array}{c} \mathbf{y}_1 \\ \vdots \\ \mathbf{y}_n \\ \end{array}% \right)\;\underset{(Mn\times L_b)}{\mathbf{F}}:=\left(% \begin{array}{c} \mathbf{F}_1 \\ \vdots \\ \mathbf{F}_n \\ \end{array}% \right) \end{equation} and \begin{equation} \underset{(Mn\times 1)}{\mathbf{\eta}}:=\left(% \begin{array}{c} \mathbf{\eta}_1 \\ \vdots \\ \mathbf{\eta}_n \\ \end{array}% \right)\; \underset{(n\times 1)}{\mathbf{\alpha}}:=\left(% \begin{array}{c} \alpha_1 \\ \vdots \\ \alpha_n \\ \end{array}% \right). \end{equation}

Further, let $$ \underset{(Mn\times n)}{\mathbf{D}}:=\mathbf{I}_n\otimes\mathbf{1}_M=\left(% \begin{array}{ccc} \mathbf{1}_M & & \mathbf{O} \\ & \ddots & \\ \mathbf{O}& & \mathbf{1}_M \\ \end{array} \right) $$

Then, the linear panel data model from under an error component assumption in matrix notation is obtained as $$ \mathbf{y}=\mathbf{D}\mathbf{\alpha}+\mathbf{F}\mathbf{\beta}+\mathbf{\eta}, $$ a dummy-variable model.

That is, we can also obtain an estimator of $\mathbf{\beta}$ from an OLS regression on the regressors and $n$ individual specific effects.

Now, note that the Frisch-Waugh-Lovell Theorem says that the OLS estimator of $\mathbf{\beta}$ can be found by regressing $\mathbf{M}_{\mathbf{D}}\mathbf{y}$ on $\mathbf{M}_{\mathbf{D}}\mathbf{F}$, where $$\underset{(Mn\times Mn)}{\mathbf{M}_{\mathbf{D}}}:=\mathbf{I}-\mathbf{D}(\mathbf{D}'\mathbf{D})^{-1}\mathbf{D}'$$ Using symmetry and idempotency of $\mathbf{M}_{\mathbf{D}}$ gives \begin{equation} \widehat{\mathbf{\beta}}_{\text{LSDV}}=(\mathbf{F}'\mathbf{M}_{\mathbf{D}}\mathbf{F})^{-1}\mathbf{F}'\mathbf{M}_{\mathbf{D}}\mathbf{y} \end{equation}

Now, \begin{align*} \mathbf{M}_{\mathbf{D}}&=\mathbf{I}_{Mn}-(\mathbf{I}_n\otimes\mathbf{1}_M)[(\mathbf{I}_n\otimes\mathbf{1}_M)'(\mathbf{I}_n\otimes\mathbf{1}_M)]^{-1}(\mathbf{I}_n\otimes\mathbf{1}_M)'\\ &=\mathbf{I}_{n}\otimes\mathbf{I}_{M}-(\mathbf{I}_n\otimes\mathbf{1}_M)[(\mathbf{I}_n\otimes\mathbf{1}_M')(\mathbf{I}_n\otimes\mathbf{1}_M)]^{-1}(\mathbf{I}_n\otimes\mathbf{1}_M')\\ &=\mathbf{I}_{n}\otimes\mathbf{I}_{M}-(\mathbf{I}_n\otimes\mathbf{1}_M)[\mathbf{I}_n\otimes\mathbf{1}_M'\mathbf{1}_M]^{-1}(\mathbf{I}_n\otimes\mathbf{1}_M')\\ &=\mathbf{I}_{n}\otimes\mathbf{I}_{M}-(\mathbf{I}_n\otimes\mathbf{1}_M)[\mathbf{I}_n\otimes M]^{-1}(\mathbf{I}_n\otimes\mathbf{1}_M')\\ &=\mathbf{I}_{n}\otimes\mathbf{I}_{M}-(\mathbf{I}_n\otimes\mathbf{1}_M)\left[\mathbf{I}_n\otimes \frac{1}{M}\right](\mathbf{I}_n\otimes\mathbf{1}_M')\\ &=\mathbf{I}_{n}\otimes\mathbf{I}_{M}-(\mathbf{I}_n\otimes\mathbf{1}_M)\left[\mathbf{I}_n\otimes \frac{1}{M}\mathbf{1}_M'\right]\\ &=\mathbf{I}_{n}\otimes\mathbf{I}_{M}-\mathbf{I}_n\otimes\mathbf{1}_M\frac{1}{M}\mathbf{1}_M'\\ &=\mathbf{I}_{n}\otimes\left(\mathbf{I}_{M}-\frac{1}{M}\mathbf{1}_M\mathbf{1}_M'\right)\\ &=\mathbf{I}_n\otimes\mathbf{Q} \end{align*}

Thus, \begin{align*} \mathbf{M}_{\mathbf{D}}\mathbf{F}&=(\mathbf{I}_n\otimes\mathbf{Q})\mathbf{F}\\ &=\left(% \begin{array}{ccc} \mathbf{Q} & & \\ & \ddots & \\ & & \mathbf{Q} \\ \end{array} \right)\mathbf{F}\\ &=\tilde{\mathbf{F}}, \end{align*} so that $$\widehat{\mathbf{\beta}}_{\text{LSDV}}=\widehat{\mathbf{\beta}}_{{FE}}.$$

Incidentally, while the notation works with balanced panel data, the result also goes through in the unbalanced case, as one can either check with more complicated notation or this numerical illustration:

library(plm)

# panel dimensions
n <- 10
m <- sample(2:4, n, replace=T) # unbalanced panel

# some data
alpha <- runif(n)
beta <- -2
y <- X <- y.d <- X.d <- c()
D <- matrix(0, sum(m), n) # for the dummy variable matrix
row.counter <- 0
for (i in 1:n) {
  X.n <- runif(m[i],i,i+1)
  X.d <- c(X.d, X.n - mean(X.n))
  X <- c(X,X.n)
  y.n <- alpha[i] + X.n*beta + rnorm(m[i])
  y <- c(y, y.n)
  y.d <- c(y.d, y.n - mean(y.n))
  
  D[(row.counter+1):(row.counter+m[i]), i] <- rep(1, m[i])
  row.counter <- row.counter + m[i]
}

Output:

> # plm
> paneldata <- data.frame(rep(1:n, times=m), unlist(sapply(m, function(i) 1:i)), y, X) # first two columns are for plm to understand the panel .... [TRUNCATED] 

> FE <- plm(y~X, data = paneldata, model = "within")

> # results:
> coef(FE)  # the slope coefficient
        X 
-2.331847 

> fixef(FE) # the intercepts
      1       2       3       4       5       6       7       8       9      10 
0.99396 2.30328 1.90957 2.22670 1.09438 3.10411 2.03265 4.39759 4.42384 4.15294 

> # FWL
> lm(y.d~X.d-1) # just the slope in this formulation

Call:
lm(formula = y.d ~ X.d - 1)

Coefficients:
   X.d  
-2.332  


> # LSDV
> lm(y~D+X-1) # intercepts and slope

Call:
lm(formula = y ~ D + X - 1)

Coefficients:
    D1      D2      D3      D4      D5      D6      D7      D8      D9     D10       X  
 0.994   2.303   1.910   2.227   1.094   3.104   2.033   4.398   4.424   4.153  -2.332

Solved – Within transformation in fixed effect regression model

The two are equivalent.

The second version uses the Frisch-Waugh-Lovell theorem which says that you can compute a subset of regression coefficients of a regression (here, $\hat\beta$) by (1) regressing $y$ on the other regressors (here, $D$), saving the residuals (here, the time-demeaned $y$ or $M_{[D]}y$, because regression on a constant just demeans the variables), then (2) regressing the $X$ on $D$ and saving the residuals $M_{[D]}X$, and (3) regress the residuals onto each other, $M_{[D]}y$ on $M_{[D]}X$.

The second version is indeed much more widely used, because typical panel data sets may have thousands of panel units, so that the first approach would require you to run a regression with thousands of regressors, which is not a good idea numerically even nowadays with fast computers.

Best Answer

Related Solutions

Fixed Effects Dummies vs Estimator – Differences in Panel Data Models

Solved – Within transformation in fixed effect regression model

Related Question