Solved – Including both individual and state fixed effects

fixed-effects-modelregression

Consider we have the following regression model:
$$y_{it}=x_{it}'\beta+\alpha_{i}+\upsilon_{it}$$
where we have data on $N$
individuals for $T$
time periods. Now, if we estimate $\beta$
by fixed effects, we only use within individual variation. As a result, the effect of time invariant regressors are not identified.

Now, if, lets say, $x$
contains the individual's state of residence. If we include state dummies, what do coefficients on those dummies identify? Where does the variation come from? In the extreme case, if all the variation in state residence is across individuals (i.e no individual moves states), then will this coefficient be identified?

Best Answer

If you have individual fixed effects, your estimate of the state dummy will be based upon within individual variation (i.e. it will be based upon the people that move across state lines). If no one switches state, then the state dummy will not be identified.

Related Solutions

Solved – Panel Data Fixed Effects Interpretation

These are also called "individual-specific intercepts", because one way to estimate the FE model is to a "least-squares dummy variables regression", in which one regresses $y$ on $x$ and a $n$ dummy variables where each individual on the panel has one dummy that takes the values one if an observation belongs to that person (household, unit, firm,...). The $\hat\alpha_i$ then estimate these intercepts, which may then be interpreted as usual intercepts in regressions, with the only difference that each intercept is specific to a single unit.

Solved – can time trend variables be fixed effects

This is a model in which you control for a state-by-state linear time trend as well as variations from that trend that are common to all states at each individual time.

To see this, consider some synthetic data generated according to this model. (The method to create them is described at the end of this post.) It consists of five observations in each of three states over eight consecutive years. No covariates $X_{ist}$ are involved, because their inclusion will shed no light on the issue of modeling time effects.

Because you are interested ultimately in the effects of the $D_{st}$ variable, this plot distinguishes the symbols by its values. They occur only in years 4 and 5. On the face of it, they are not unusual.

We could fit a model with linear time trends in each state, controlling for $D_{st}$:

$$y_{ist}=\alpha_{0s}+\alpha_{1s}t + \quad\quad\quad + \theta D_{st} + \epsilon_{ist}$$

The $\lambda_t$ term is omitted.

Here are the fitted trends, one per state, controlling for $D_{st}$:

You can see the states do experience different rates of change over time. Morever, there is some collective variation around those fitted lines. In particular, the values for State 1 in years 5 and 6 are unusually high--and these are the ones associated with $D_{st}=1.$ Should we attribute this to a real effect or to some form of variation that affects all states, independently of $D_{st}$?

Let's examine the residuals:

I have collected the residuals into boxplots (a) by time (the black-and-white wide boxplots in the background) and (b) by time and state (the colored narrower boxplots in the foreground). You can see that the residuals significantly change from one time to the next, but those for all states change in the same manner. We needed to control for this common year-to-year variation in order to determine that the unusually high values for $D_{st}=1$ in years 4 and 5 in state 1 are meaningful.

The software might complain when you fit the model. This is because the presence of the $\lambda_t$ term, which provides a separate mean value for each year, effectively establishes a "baseline" to which all the states are compared. This creates a redundancy, exactly in the same way any categorical variable creates one, requiring us to interpret all temporal changes as being relative to the baseline. The OLS procedure in R, lm, elects not to fit a slope for the last state:

lm(formula = Value ~ -1 + State + State:Time + Year + D.st, data = X)

(Year is a categorical version of the numerical Time variable.)

Coefficients: (1 not defined because of singularities)
              Estimate Std. Error t value Pr(>|t|)    
StateS.1       1.58315    0.22167   7.142 1.17e-10 ***
StateS.2       2.35555    0.21895  10.759  < 2e-16 ***
StateS.3       2.41142    0.18867  12.781  < 2e-16 ***
Year2          2.10770    0.19827  10.631  < 2e-16 ***
Year3         -0.20172    0.20507  -0.984    0.327    
Year4          0.11881    0.23027   0.516    0.607    
Year5          2.59317    0.24377  10.638  < 2e-16 ***
Year6          2.18162    0.24749   8.815 2.42e-14 ***
Year7          3.85025    0.26703  14.419  < 2e-16 ***
Year8          2.26431    0.28843   7.851 3.38e-12 ***
D.st           5.45442    0.23999  22.728  < 2e-16 ***
StateS.1:Time -1.14550    0.05237 -21.874  < 2e-16 ***
StateS.2:Time -0.67605    0.05237 -12.909  < 2e-16 ***
StateS.3:Time       NA         NA      NA       NA

Incidentally, the coefficient of $D_{st}$ used to generate these data was set at $\theta=6$. The OLS fit in this example is $\hat\theta=5.45\pm 0.24.$ That's pretty accurate.

In might be helpful to see how these data were generated. I created arrays to hold the values of the parameters and used those to compute the Value field in a dataframe X of rows (indexed by $i$) that contain the State ($s$), numerical Time ($t$), and 0-1 numerical d.st codes ($D_{st}$):

X$Value <- with(X,states.intercept[State] + 
                  states.slope[State] * Time + 
                  effects.time[Time] + 
                  effects.main * c(d.st) +
                  errors)
X$Year <- factor(X$Time) # Used by `lm` for individual time terms lambda_t

Here, states.intercept is $\alpha_{0s}$, states.slope is $\alpha_{1s}$, effects.time is $\lambda_t$, effects.main is $\theta$, and errors are iid Normally distributed random values to realize $\epsilon_{ist}$.

Best Answer

Related Solutions

Solved – Panel Data Fixed Effects Interpretation

Solved – can time trend variables be fixed effects

Related Question