@Charlie is right. You only have two time periods, so there will inevitably be variation in the $i$-specific sample variances of $x_{it}$. In addition, even if you have programmed the simulation for there to be homogenous effects, due to small number of periods there will inevitably be some sample correlation between $x_{it}$ and, e.g., your error term, and so there will inevitably be some "effect heterogeneity" in the $i$-specific partial relationships between $x_{it}$ and $y_{it}$. The interaction of conditional variance and effect heterogeneity tilts your FE estimates of coefficients on $x_{it}$. The coefficient on $x_{it}$ is a precision-weighted average of the $i$-specific coefficients on $x_{it}$. A different tilting occurs when you fit OLS to the model that you have specified above: now, the coefficient on $x_{it}$ is a precision weighted average of the coefficients on $x_{it}$ for the those with $treatment_i=1$ and those with $treatment_i=0$. These differences propagate to your estimates of $\beta_3$. Think Frisch-Waugh-Lovell. To demonstrate the validity of Charlie's claim, simply generate $x_{it}$'s where the variance is exactly constant for each $i$, but you still have different patterns. E.g, randomly assign $i$'s to have either $(x_{i1}, x_{i2})=(0,1)$ or $(1,0)$. If you do this, you will see that the differences between the FE and OLS estimates disappears.
Question 1
If your outcome variable is integrated, you might consider using a single-equation generalized error correction model (GECM) as per Banerjee (1993) and De Boef (2001), as this model is agnostic to the stationarity of the predictors.
You might evaluate the stationarity of your outcome using:
$\log{(GDP/Labor)_{ti}} \sim \rho_{i}\log{(GDP/Labor)_{t-1i}} + \zeta_{ti} + \mu_{\rho_{i}}$,
where:
$\zeta_{ti}$ measures all disturbances to $\log{(GDP/Labor)_{ti}}$ in each time $t$ (assumed distributed normal), and
$\mu_{\rho_{i}}$ measures state-level variation in $\log{(GDP/Labor)_{ti}}$ (assumed distributed normal).
If $|\rho_{i}| \approx 1$, then you've got nearly integrated data, and the GECM, which also has the attractive properties of disentangling long-run effects, from both instantaneous change short term effects and from lagged short term effects.
The general form of the single equation GECM is:
$\Delta y_{t} = \beta_{0} + \beta_{c}\left[y_{t-1}-\left(\mathbf{X}_{t-1}\right)\right] + \mathbf{B}_{\Delta\mathbf{X}}\Delta\mathbf{X}_{t} + \mathbf{B}_{\mathbf{X}}\mathbf{X}_{t-1} + \varepsilon$,
where:
$\Delta$ is the first difference operator (e.g. $\Delta y_{t} = y_{t} - y_{t-1}$), and $\varepsilon$ may be decomposed into mixed effects (e.g. by including $\beta_{0i}$, for country-level random intercepts).
instantaneous short run effects are given by $\beta_{\Delta\mathbf{X}}$,
lagged short run effects are given by $\beta_{\mathbf{X}} - \beta_{c} - \beta_{\Delta\mathbf{X}}$, and
long run effects are given by $\left(\beta-{c}-\beta_{\mathbf{X}}\right)/\beta_{c}$.
This specification assumes a homogeneity of error correction processes. I haven't yet tried to derive a heterogeneous error correction specification...
In Stata you can perform Hadri's test for unit-root in panel data on the residuals of such a model, to check them for stationarity.
Question 2
I do not know that I can say much useful here.
Question 3
The time dummies can be included in the GECM model, and presumably other dynamic times series models, often they are used as indicators of, for example, policies going into effect. I have done something similar, but used (time-varying) proportions (rather than 0/1 indicator variables) to represent the portion of the time period during which a policy was in effect (e.g. some policies go into effect January 1, some July 1, some December 21, etc.). On the other hand: you don't have tons of data, so I suppose it depends how many new variables you are adding.
References:
Banerjee, A., Dolado, J. J., Galbraith, J. W., and Hendry, D. F. (1993). Co-integration, error correction, and the econometric analysis of non-stationary data. Oxford University Press, USA.
De Boef, S. (2001). Modeling equilibrium relationships: Error correction models with strongly autoregressive data. Political Analysis, 9(1):78–94.
Best Answer
As kjetil said in his comment it should not be a problem to use a differenced variable in a fixed effects regression. The question is: why would you want to do it? First differencing removes information from your variables and you lose one observation per panel. If the sole purpose is to remove the country specific fixed effects you might be throwing out the baby with the bath water.
I also think there is some misconception with respect the statistical programing part of your problem. What do you mean by "when we use fixed effects model, it automatically uses first differences of the data". I don't know what statistical package you use, but for instance in Stata the command
xtreg lp hc fdi hc_fdi, fe
uses the within transformation and not first differences. Conversely, when you first difference your data and then use theregress
command, this will give you a first difference regression. Both are ways to eliminate the unobserved country specific effects and do not need to be done together as they are distinct concepts.It's probably worthwhile to review these two concepts (for instance in the lecture here).