In the most cases it is assumed that $E[\epsilon_t]=0$. Then, strict exogeneity implies that the regressors are orthogonal to the error term for all observations $s$, i. e. $E[x_s \epsilon_t]=0$. For some time series models this is violated. Consider the AR(1) model $ \ y_t=\beta y_{t-1}+ \epsilon_t \ $ with $ \ \epsilon_t \sim N(0, \sigma^2) \ $ $ \ \forall \ $ $t$. Since you regress $y_t$ on $y_{t-1}$ the error term $\epsilon_t$ is orthogonal to $y_{t-1}$, i. e.
$E[y_{t-1} \epsilon_t]=0$.
However, strict exogeneity requires $y_t$ to be orthogonal to $all$ $\epsilon_t$. That does not hold for the considered model - as will be shown:
$E[y_t \epsilon_t]=E[(\beta y_{t-1}+ \epsilon_t)\epsilon_t] \qquad (by \ \ \ y_t=\beta y_{t-1}+ \epsilon_t)$
$ \quad \qquad =\beta E[y_{t-1} \epsilon_t]+E[\epsilon_t^2]$
$ \quad \qquad =E[\epsilon_t^2] \qquad \qquad \qquad \quad (by \ \ \ E[y_{t-1} \epsilon_t]=0)$.
$ \quad \qquad =\sigma^2 \qquad \qquad \qquad \quad \quad (by \quad \epsilon_t \sim N(0, \sigma^2))$.
Therefore, $y_t$ is not orthogonal to all error terms but the regressor for $y_{t+1}$. Thus, strict exogeneity is violated.
This implies, there is only strict exogeneity if $\epsilon_t = 0$ for all $t$.
My hunch would be - without having checked Wooldridge - that he refers to a situation in which there also are individual (country, in your example)-specific effects next to the time effects.
I ran
library(plm)
plm(y ~ x1 + country_age, data = Panel, effect = "twoways", model = "within")
plm(y ~ x1 + country_age, data = Panel, effect = "time", model = "within")
on your first set of data, and do get a coefficient on country_age
in the latter case, but not in the former.
> plm(y ~ x1 + country_age, data = Panel, effect = "twoways", model = "within")
Model Formula: y ~ x1 + country_age
Coefficients:
x1
2409669178
> plm(y ~ x1 + country_age, data = Panel, effect = "time", model = "within")
Model Formula: y ~ x1 + country_age
Coefficients:
x1 country_age
2409669178 91766658
Notice that including an individual-specific fixed effect amounts to unitwise demeaning of all regressors (see e.g. here). If the changes of one regressor are constant over time across units, the demeaned variable will be collinear with the unitwise demeaned time effects.
Consider the following artificial regressor matrix of a panel data model with both individual-specific effects (the first two columns, i.e. two "countries"), the time effects (3rd to 6th column) and the constant-changes regressors with different starting points (7th column).
We observe that the regressor matrix has rank 5, so that even with different starting points, the time effects and the constant change regressor are collinear (one rank is lost due to collinearity of individual and time effects, which is why Wooldridge already drops the time dummy for the first year). Equivalently, even with different starting points and dropping column 3, we can combine columns 1, 2, 4, 5 and 6 into column 7 via
$$6\times x_1+7\times x_2+2\times x_4 +2\times x_5+2\times x_6.$$
X <- matrix(c(rep(1,4), rep(0,4), rep(0,4), rep(1,4), # dummies for the units
rep(c(1,0,0,0),2), rep(c(0,1,0,0),2), rep(c(0,0,1,0),2), rep(c(0,0,0,1),2), # dummies for the time points
seq(6, by=2, length.out=4), seq(7, by=2, length.out=4)), ncol=7) # constant-increase regressor
X
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 0 1 0 0 0 6
[2,] 1 0 0 1 0 0 8
[3,] 1 0 0 0 1 0 10
[4,] 1 0 0 0 0 1 12
[5,] 0 1 1 0 0 0 7
[6,] 0 1 0 1 0 0 9
[7,] 0 1 0 0 1 0 11
[8,] 0 1 0 0 0 1 13
> qr(X)$rank
[1] 5
This also shows why time effects and same starting points (modify the last four elements of the last column to 6, 8, 10, 12 to try) cannot both be estimated even without individual-specific effects: just as individual-specific effects do not go together with time-invariant regressors, regressors require variation across units when being fitted next to time effects.
Now, with the same starting point and the same increases, the regressor takes the same value across units for each point in time and hence gets dropped when fitting time effects:
> lm(y~X[,3:7]-1)
Call:
lm(formula = y ~ X[, 3:7] - 1)
Coefficients:
X[, 3:7]1 X[, 3:7]2 X[, 3:7]3 X[, 3:7]4 X[, 3:7]5
-1.16909 -0.51927 0.02666 0.41310 NA
Equivalently, columns 3 to 6 alone can then be linearly combined into column 7.
Best Answer
Strict exogeneity means that the error $u_t$ is uncorrelated with all past and future values of the seasonal dummies. This means that such variables cannot react to shocks to $y$ in the past or the future. Suppose consumers feel worried about the economy in December and such sentiments are unobserved. This means there was a negative shock to Amazon sales that month as people cut back on presents. Big negative error. I don't get my pony. But Amazon just cannot decide to have a Christmas season again in January. Contrast this to the effect of police on crime example from earlier in that chapter. If there was a gang war in December, the police force would jump $n$ months later as the mayor gets tough on crime and the cadets graduate. Now that would violate the strict exogeneity assumption.