I have worked on similar projects and am confronting one right now. The way that we handle this is to put in a fixed effect for each village and then to cluster the standard errors by village. This is not a perfect solution, but is fairly standard practice.
The plm
package in R and xtreg ..., fe
command in Stata, and the traditional fixed effect (within) estimator are designed to follow individuals. I believe one of the names for the method that you want is called a hierarchical linear model.
The simplest implementation in R would be something like
myLM <- lm(y ~ x + v v.t*t, data=df)
where y is the outcome of interest, x is some set of controls, v is a factor variable for the villages, v.t is a binary (factor) variable indicating whether a village was treated, and t is an indicator for pre-post treatment.
For standard inference, it is typical and recommended to produce clustered standard errors use either the multiwayvcov
package or clusterSEs
package.
Another method for inference, and the preferred method in Bertrand, Duflo & Mullainathan, 2004 is to perform a placebo test, where you vary "treatment" across all villages, form an empirical CDF, and see where the effect of treatment for the truly treated village sits in that distribution. Note that this is roughly the same method recommended for inference with synthetic controls of Abadie, Diamond, and Hainmueller, and has ties back to Fisher's 1935 text.
Choosing between RE and FE depends on your assumtions about the error term. FE tries to remove constant unobserved homogeneity, where as RE assume not unobserved factors and instead corrects for serial correlation.
Use RE only if you think that $cov(x_{itj},a_{i})=0$. Typically FE is a much more convincing, and the leading case for using RE is if a important variable is time constant - but then correlated random effects can be employed. If you willing to assume a very strict set of assumption, then you could use the hausman test to help you decide.
For an introduction to correlated random effects see this - *.pdf, from the master himself (Wooldridge).
Best Answer
You are correct. You can't use a fixed effects model to analyze the effect of a treatment that is assigned at the "group" level. In your case the "groups" are people and the individual observations are time points, "nested" within people.
The reason for this is including person fixed effects accounts for ALL possible between-person variation. This is great for when you are really interested in within-person variation, because you don't have to worry about ANY person level confounders. But in your case one of those person level differences is what treatment they got! So as you note trying to include a treatment variable along side person fixed effects will cause perfect multicollinearity and lead to the treatment variable being dropped.
So as you say, you need to use a random effects model. This in turn means that if you are worried about other person-level confounders (for example if older people were more likely to get treatment A) you also need to include variables for those confounders in your model.