I have worked on similar projects and am confronting one right now. The way that we handle this is to put in a fixed effect for each village and then to cluster the standard errors by village. This is not a perfect solution, but is fairly standard practice.
The plm
package in R and xtreg ..., fe
command in Stata, and the traditional fixed effect (within) estimator are designed to follow individuals. I believe one of the names for the method that you want is called a hierarchical linear model.
The simplest implementation in R would be something like
myLM <- lm(y ~ x + v v.t*t, data=df)
where y is the outcome of interest, x is some set of controls, v is a factor variable for the villages, v.t is a binary (factor) variable indicating whether a village was treated, and t is an indicator for pre-post treatment.
For standard inference, it is typical and recommended to produce clustered standard errors use either the multiwayvcov
package or clusterSEs
package.
Another method for inference, and the preferred method in Bertrand, Duflo & Mullainathan, 2004 is to perform a placebo test, where you vary "treatment" across all villages, form an empirical CDF, and see where the effect of treatment for the truly treated village sits in that distribution. Note that this is roughly the same method recommended for inference with synthetic controls of Abadie, Diamond, and Hainmueller, and has ties back to Fisher's 1935 text.
These are also called "individual-specific intercepts", because one way to estimate the FE model is to a "least-squares dummy variables regression", in which one regresses $y$ on $x$ and a $n$ dummy variables where each individual on the panel has one dummy that takes the values one if an observation belongs to that person (household, unit, firm,...). The $\hat\alpha_i$ then estimate these intercepts, which may then be interpreted as usual intercepts in regressions, with the only difference that each intercept is specific to a single unit.
Best Answer
If you have $N$ individuals and you include $N-1$ individual dummies (one less in order to avoid the dummy variable trap) in an OLS regression like $$y_{it} = X'_{it}\beta + \sum_{i=1}^{N-1}\delta_i (\text{individual}_i) + \epsilon_{it}$$ then this is called a least squares dummy variable (LSDV) regression. In this case, each individual dummy will "absorb" the individual fixed effects $u_i$ that are hidden in the error term $\epsilon_{it} = u_i + e_{it}$.
Mundlak (1978) has shown that the LSDV regression is equivalent to the fixed effects estimator: $$y_{it} - \overline{y}_{i} = (X_{it} - \overline{X}_i)\beta + \epsilon_{it} - \overline{\epsilon}_i$$ where $\overline{y}_{i} = \frac{1}{T}\sum^{T}_{t=1}y_{it}$, $\overline{x}_{i} = \frac{1}{T}\sum^{T}_{t=1}x_{it}$, and $\overline{\epsilon}_{i} = \frac{1}{T}\sum^{T}_{t=1}\epsilon_{it}$. Back in the days when computers weren't very fast, having large panels basically made LSDV infeasible because there were too many dummies. Therefore Mundlak's finding was very useful because it dispenses of including all these individual dummies and instead using the within transformation made things much simpler.
So if you do a fixed effects regression you don't need to include all individual dummies. In fact, your statistical software will just drop them should you include them in a fixed effects regression. Also in a first differences regression the individual dummies will drop out because they do not change over time, hence the difference is zero for all the dummies and then your statistical software will omit them due to perfect collinearity. Doing either fixed effects or first differences already solves the problem of time-invariant unobserved variables ($u_i$). LSDV is just another way of doing it and for this reason it won't help you to combine it with the other methods.
When you include individual dummies after first differencing your other variables, i.e. a first differences regression with individual dummies, those dummies will estimate individual trend effects (see page 77, footnote 1 in the notes here).