Fixed Effects Model – Analyzing Time Fixed-Effects and Constantly Changing Variables

econometricsfixed-effects-modelleast squarespanel data

According to Wooldridge (2016) "When we include a full set of year dummies — that is, year dummies for all years but the first — we cannot estimate the effect of any variable whose change across time is constant". He gives an example of work experience (in the case that each person works every year), which increases by one each year. However, people can have very different years of work experience to start with. Meaning that Person 1 can have a work experience of 1 year at the start of the data sample, and Person 2 might have a work experience of 22 years at the start of the data sample.

I recreated this type of problem with a dummy data set and the age of a country:

library(foreign)
library(dplyr)
Panel <- read.dta("http://dss.princeton.edu/training/Panel101.dta")

Panel <- Panel %>%
  filter(country == "A" | country == "B")

Panel <- Panel %>%
  mutate(country_age = case_when(year == 1990 & country == "A" ~ 22,
                                    year == 1991 & country == "A" ~ 23,
                                    year == 1992 & country == "A"~ 24,
                                    year == 1993 & country == "A"~ 25,
                                    year == 1994 & country == "A"~ 26,
                                    year == 1995 & country == "A"~ 27,
                                    year == 1996 & country == "A"~28,
                                    year == 1997 & country == "A"~ 29,
                                    year == 1998 & country == "A"~ 30,
                                    year == 1999 & country == "A"~ 31))
Panel <- Panel %>%
  mutate(country_age = case_when(year == 1990 & country == "B" ~ 1,
                                    year == 1991 & country == "B" ~ 2,
                                    year == 1992 & country == "B"~ 3,
                                    year == 1993 & country == "B"~ 4,
                                    year == 1994 & country == "B"~ 5,
                                    year == 1995 & country == "B"~ 6,
                                    year == 1996 & country == "B"~ 7,
                                    year == 1997 & country == "B"~ 8,
                                    year == 1998 & country == "B"~ 9,
                                    year == 1999 & country == "B"~ 10,
                                    TRUE ~ country_age))

lmod <- lm(y ~ x1 + country_age + factor(year) - 1, data = Panel)
summary(lmod)

To my surpise, I get a coefficient for each of the year dummies as well as the country_age variable which increases by a constant of 1 for both countries. Isn't that contradictory what is written in Wooldridge? Country_age changes constant through time and I can still get an estimate for this variable.

Only if I use the same time series for both countries like this,

library(foreign)
library(dplyr)
Panel <- read.dta("http://dss.princeton.edu/training/Panel101.dta")

Panel <- Panel %>%
  filter(country == "A" | country == "B")

Panel <- Panel %>%
  mutate(country_age = case_when(year == 1990 & country == "A" ~ 1,
                                    year == 1991 & country == "A" ~ 2,
                                    year == 1992 & country == "A"~ 3,
                                    year == 1993 & country == "A"~ 4,
                                    year == 1994 & country == "A"~ 5,
                                    year == 1995 & country == "A"~ 6,
                                    year == 1996 & country == "A"~7,
                                    year == 1997 & country == "A"~ 8,
                                    year == 1998 & country == "A"~ 9,
                                    year == 1999 & country == "A"~ 10))
Panel <- Panel %>%
  mutate(country_age = case_when(year == 1990 & country == "B" ~ 1,
                                    year == 1991 & country == "B" ~ 2,
                                    year == 1992 & country == "B"~ 3,
                                    year == 1993 & country == "B"~ 4,
                                    year == 1994 & country == "B"~ 5,
                                    year == 1995 & country == "B"~ 6,
                                    year == 1996 & country == "B"~ 7,
                                    year == 1997 & country == "B"~ 8,
                                    year == 1998 & country == "B"~ 9,
                                    year == 1999 & country == "B"~ 10,
                                    TRUE ~ country_age))

lmod <- lm(y ~ x1 + country_age + factor(year) - 1, data = Panel)
summary(lmod)

I am not getting an estimate for one of the year dummies.

Am I interpreting something wrong in the statement of Wooldridge? I don't know where I would, because I kind off recreated the exact example he gives in that regard.

Or is the statement in Wooldridge flawed (which I highly doubt) and it is only impossible to estimate variables that have the exact same time series for all entities, when using time fixed effects (e.g., macroeconomic variables).

Best Answer

My hunch would be - without having checked Wooldridge - that he refers to a situation in which there also are individual (country, in your example)-specific effects next to the time effects.

I ran

library(plm)
plm(y ~ x1 + country_age, data = Panel, effect = "twoways", model = "within")
plm(y ~ x1 + country_age, data = Panel, effect = "time", model = "within")

on your first set of data, and do get a coefficient on country_age in the latter case, but not in the former.

> plm(y ~ x1 + country_age, data = Panel, effect = "twoways", model = "within")

Model Formula: y ~ x1 + country_age

Coefficients:
        x1 
2409669178 

> plm(y ~ x1 + country_age, data = Panel, effect = "time", model = "within")

Model Formula: y ~ x1 + country_age

Coefficients:
         x1 country_age 
 2409669178    91766658 

Notice that including an individual-specific fixed effect amounts to unitwise demeaning of all regressors (see e.g. here). If the changes of one regressor are constant over time across units, the demeaned variable will be collinear with the unitwise demeaned time effects.

Consider the following artificial regressor matrix of a panel data model with both individual-specific effects (the first two columns, i.e. two "countries"), the time effects (3rd to 6th column) and the constant-changes regressors with different starting points (7th column).

We observe that the regressor matrix has rank 5, so that even with different starting points, the time effects and the constant change regressor are collinear (one rank is lost due to collinearity of individual and time effects, which is why Wooldridge already drops the time dummy for the first year). Equivalently, even with different starting points and dropping column 3, we can combine columns 1, 2, 4, 5 and 6 into column 7 via

$$6\times x_1+7\times x_2+2\times x_4 +2\times x_5+2\times x_6.$$

X <- matrix(c(rep(1,4), rep(0,4), rep(0,4), rep(1,4), # dummies for the units
               rep(c(1,0,0,0),2), rep(c(0,1,0,0),2), rep(c(0,0,1,0),2), rep(c(0,0,0,1),2), # dummies for the time points
               seq(6, by=2, length.out=4), seq(7, by=2, length.out=4)), ncol=7) # constant-increase regressor
X
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    1    0    1    0    0    0    6
[2,]    1    0    0    1    0    0    8
[3,]    1    0    0    0    1    0   10
[4,]    1    0    0    0    0    1   12
[5,]    0    1    1    0    0    0    7
[6,]    0    1    0    1    0    0    9
[7,]    0    1    0    0    1    0   11
[8,]    0    1    0    0    0    1   13
> qr(X)$rank
[1] 5

This also shows why time effects and same starting points (modify the last four elements of the last column to 6, 8, 10, 12 to try) cannot both be estimated even without individual-specific effects: just as individual-specific effects do not go together with time-invariant regressors, regressors require variation across units when being fitted next to time effects.

Now, with the same starting point and the same increases, the regressor takes the same value across units for each point in time and hence gets dropped when fitting time effects:

> lm(y~X[,3:7]-1)

Call:
lm(formula = y ~ X[, 3:7] - 1)

Coefficients:
X[, 3:7]1  X[, 3:7]2  X[, 3:7]3  X[, 3:7]4  X[, 3:7]5  
 -1.16909   -0.51927    0.02666    0.41310         NA

Equivalently, columns 3 to 6 alone can then be linearly combined into column 7.