Solved – Using year fixed effects on data with yearly observations

categorical dataeconometricsfixed-effects-modelpanel data

I have a panel data set with yearly observations of various firms over a period of 5 years. I am running a fixed effects model in Stata using xtreg. Is it problematic to include a dummy variable for individual years, since I have yearly observations? (I would exclude a reference dummy.)

Here is the Stata code I would be using:

xtreg y x i.year, fe

Best Answer

It's not problematic and is even a good idea. The year dummies will pick up any variation in the outcome that happen over time and that is not attributed to your other explanatory variables. The other thing with fixed effects estimation in Stata is that many people are deceived by the xtset command where you can set a panel and a time variable. Only the panel variable is used to eliminate the individual (or in this case firm) fixed effects but it does nothing about the time fixed effects. So xtreg will perform the within transformation using the specified panel id but if you want to control for year fixed effects you need to include the dummies as you suggest.

Related Solutions

Panel Data Regression – Correct Way to Deal with Multiple Fixed Effects

When you use time dummies, you don't need a time dummy for every individual separately but for every year. So this leaves you with 28 time dummies and 997 individual dummies (always omitting the first year and first individual to avoid the dummy variable trap).

The solution to your problem is much simpler than what the other answer suggested here. If you read any introductory text on panel data (you can start with these lecture notes), you should acquaint yourself with the fixed effects estimator which is sometimes referred to as the within estimator as well.

The procedure is as follows:

average each variable over time for each individual, e.g. $\overline{y}_i = \frac{1}{T}\sum_{t=1}^{29}y_{it}$ and $\overline{x}_i = \frac{1}{T}\sum_{t=1}^{29}x_{it}$
subtract this individual mean from each observation, $\tilde{y}_{it} = y_{it} - \overline{y}_{i}$ and $\tilde{x}_{it} = x_{it} - \overline{x}_i$
regress $\tilde{y}_{it}$ on $\tilde{x}_{it}$ and you year dummies, and cluster the standard errors on the individual's ID to account for serial correlation

Even though it is not very apparent, Mundlak (1978) has shown that this procedure is equivalent to including a dummy for every individual minus 1 (again omitting the first individual) as you propose it. The advantage is obvious: you don't need all those dummies when you use this three step procedure which is called the "within transformation".

Most statistical software have ready canned packages/routines for this type of estimation as it is fairly standard. In Stata you would simply declare your data to be a panel data set which allows you to use the corresponding panel data regression and data analysis commands. For example:

webuse nlswork
tsset idcode year
xtreg ln_wage age union i.year, cluster(idcode)

Where i.year automatically inserts your year dummies for the regression. So with 29 years you lose 28 degrees of freedom which isn't awful. A nice introduction to the topic is

Wooldridge, J. (2008) "Introductory Econometrics", 4th Edition, South Western College
Baltagi, B.H. (2013) "Econometric Analysis of Panel Data", 5th Edition, John Wiley & Sons
Wooldridge, J. (2010) "Econometric Analysis of Cross Section and Panel Data", 2nd Edition, MIT Press

The last reference is for advanced students.

Solved – Random effects vs fixed effects for analysis of panel data (econometrics)

This specification allows you to capture the time-invariant heterogeneity. The difference between fixed and random effects is the following. For a model $$y_{it} = \alpha + X'_{it}\beta + c_i + \epsilon_i$$ where $y$ is the outcome, $X$ are time-varying controls, $c_i$ are the firms' characteristics that do not change over time, and $\epsilon$ is an error term, and $i$ and $t$ index firms and years, respectively.

Fixed effects estimation eliminates the $c_i$ by utilizing the within transformation or first differencing (for details, see for instance these lecture notes). Random effects on the other hand ignores the $c_i$ and leaves them in the error term. This of course only works if all your explanatory variables $X$ are not correlated with $c_i$. The random effects estimator then uses a matrix weighted average of the within and between variation of your data. The fixed effects estimator only uses the within (i.e. the intra firm) variation. This makes random effects more efficient meaning that the standard errors are smaller and you can include time-invariant variables which is good if you are interested in their coefficients.

In practice, the assumption of random effects is often implausible. You can directly test this using the Hausman test. Whether or not the $X_{it}$ are correlated with $c_i$, the fixed effects estimator is consistent. Random effects is only consistent under the above stated assumption. The Hausman test then compares these two models and, broadly speaking, if their results do not differ significantly, you may as well use random effects. If they differ significantly then you know that the assumptions for random effects are likely to be violated and in that case you better stick with fixed effects.

Best Answer

Related Solutions

Panel Data Regression – Correct Way to Deal with Multiple Fixed Effects

Solved – Random effects vs fixed effects for analysis of panel data (econometrics)

Related Question