Solved – Strict Exogeneity and Seasonal Dumthe Variables

econometricsexogeneityregression

Wooldridge (Intro Econometric book) he states that seasonal dummy variables (say a dummy for the calendar month) satisfy the strict exogeneity assumption because "they follow a deterministic pattern. For example, the months do not change based upon whether the explanatory variables or the dependent variable changes ".

Why is this? What does the explanation have to do with the error term (data not included in the regression that influence the dependent variable) being correlated with any of the seasonal dummies?

Best Answer

Strict exogeneity means that the error $u_t$ is uncorrelated with all past and future values of the seasonal dummies. This means that such variables cannot react to shocks to $y$ in the past or the future. Suppose consumers feel worried about the economy in December and such sentiments are unobserved. This means there was a negative shock to Amazon sales that month as people cut back on presents. Big negative error. I don't get my pony. But Amazon just cannot decide to have a Christmas season again in January. Contrast this to the effect of police on crime example from earlier in that chapter. If there was a gang war in December, the police force would jump $n$ months later as the mayor gets tough on crime and the cadets graduate. Now that would violate the strict exogeneity assumption.

Related Solutions

Solved – Strict exogeneity and lagged variables

In the most cases it is assumed that $E[\epsilon_t]=0$. Then, strict exogeneity implies that the regressors are orthogonal to the error term for all observations $s$, i. e. $E[x_s \epsilon_t]=0$. For some time series models this is violated. Consider the AR(1) model $ \ y_t=\beta y_{t-1}+ \epsilon_t \ $ with $ \ \epsilon_t \sim N(0, \sigma^2) \ $ $ \ \forall \ $ $t$. Since you regress $y_t$ on $y_{t-1}$ the error term $\epsilon_t$ is orthogonal to $y_{t-1}$, i. e. $E[y_{t-1} \epsilon_t]=0$.

However, strict exogeneity requires $y_t$ to be orthogonal to $all$ $\epsilon_t$. That does not hold for the considered model - as will be shown:

$E[y_t \epsilon_t]=E[(\beta y_{t-1}+ \epsilon_t)\epsilon_t] \qquad (by \ \ \ y_t=\beta y_{t-1}+ \epsilon_t)$
$ \quad \qquad =\beta E[y_{t-1} \epsilon_t]+E[\epsilon_t^2]$
$ \quad \qquad =E[\epsilon_t^2] \qquad \qquad \qquad \quad (by \ \ \ E[y_{t-1} \epsilon_t]=0)$. $ \quad \qquad =\sigma^2 \qquad \qquad \qquad \quad \quad (by \quad \epsilon_t \sim N(0, \sigma^2))$.

Therefore, $y_t$ is not orthogonal to all error terms but the regressor for $y_{t+1}$. Thus, strict exogeneity is violated.

This implies, there is only strict exogeneity if $\epsilon_t = 0$ for all $t$.

Fixed Effects Model – Analyzing Time Fixed-Effects and Constantly Changing Variables

My hunch would be - without having checked Wooldridge - that he refers to a situation in which there also are individual (country, in your example)-specific effects next to the time effects.

I ran

library(plm)
plm(y ~ x1 + country_age, data = Panel, effect = "twoways", model = "within")
plm(y ~ x1 + country_age, data = Panel, effect = "time", model = "within")

on your first set of data, and do get a coefficient on country_age in the latter case, but not in the former.

> plm(y ~ x1 + country_age, data = Panel, effect = "twoways", model = "within")

Model Formula: y ~ x1 + country_age

Coefficients:
        x1 
2409669178 

> plm(y ~ x1 + country_age, data = Panel, effect = "time", model = "within")

Model Formula: y ~ x1 + country_age

Coefficients:
         x1 country_age 
 2409669178    91766658

Notice that including an individual-specific fixed effect amounts to unitwise demeaning of all regressors (see e.g. here). If the changes of one regressor are constant over time across units, the demeaned variable will be collinear with the unitwise demeaned time effects.

Consider the following artificial regressor matrix of a panel data model with both individual-specific effects (the first two columns, i.e. two "countries"), the time effects (3rd to 6th column) and the constant-changes regressors with different starting points (7th column).

We observe that the regressor matrix has rank 5, so that even with different starting points, the time effects and the constant change regressor are collinear (one rank is lost due to collinearity of individual and time effects, which is why Wooldridge already drops the time dummy for the first year). Equivalently, even with different starting points and dropping column 3, we can combine columns 1, 2, 4, 5 and 6 into column 7 via

$$6\times x_1+7\times x_2+2\times x_4 +2\times x_5+2\times x_6.$$

X <- matrix(c(rep(1,4), rep(0,4), rep(0,4), rep(1,4), # dummies for the units
               rep(c(1,0,0,0),2), rep(c(0,1,0,0),2), rep(c(0,0,1,0),2), rep(c(0,0,0,1),2), # dummies for the time points
               seq(6, by=2, length.out=4), seq(7, by=2, length.out=4)), ncol=7) # constant-increase regressor
X
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    1    0    1    0    0    0    6
[2,]    1    0    0    1    0    0    8
[3,]    1    0    0    0    1    0   10
[4,]    1    0    0    0    0    1   12
[5,]    0    1    1    0    0    0    7
[6,]    0    1    0    1    0    0    9
[7,]    0    1    0    0    1    0   11
[8,]    0    1    0    0    0    1   13
> qr(X)$rank
[1] 5

This also shows why time effects and same starting points (modify the last four elements of the last column to 6, 8, 10, 12 to try) cannot both be estimated even without individual-specific effects: just as individual-specific effects do not go together with time-invariant regressors, regressors require variation across units when being fitted next to time effects.

Now, with the same starting point and the same increases, the regressor takes the same value across units for each point in time and hence gets dropped when fitting time effects:

> lm(y~X[,3:7]-1)

Call:
lm(formula = y ~ X[, 3:7] - 1)

Coefficients:
X[, 3:7]1  X[, 3:7]2  X[, 3:7]3  X[, 3:7]4  X[, 3:7]5  
 -1.16909   -0.51927    0.02666    0.41310         NA

Equivalently, columns 3 to 6 alone can then be linearly combined into column 7.

Best Answer

Related Solutions

Solved – Strict exogeneity and lagged variables

Fixed Effects Model – Analyzing Time Fixed-Effects and Constantly Changing Variables

Related Question