# When and how of stratification

conditioningexperiment-designrandom allocationstratification

I understand stratification at a novice level. For example, if we want to condition on gender in post experimentation inference, we might stratify or block on gender. As I understand it, we take each value that the block variable can take on, retrieve all units where the filter is true, and randomly assign to treatment and control. In result, the variable will either be evenly split or within $$+/-1$$ one unit (if sample size is odd).

Some questions

1. When is this necessary? (Aka Shouldn't pure randomization get "close enough"?)
2. Is it wrong to condition on a variable where stratification was not done?
3. What about continuous variables, is there an agreed upon approach?

To the last question, I imagine that the data could be grouped into quantiles and then pure randomization is applied to each quantile block.

Lastly, as I understand it, ANCOVA can correct for imbalances in the dependent variable and its covariates; does stratification just exclusively prevent any strong correlations therein? (Whereas it's just assumed to be highly unlikely but not impossible through pure randomization?)

#### Best Answer

This is a big topic, but a few points for starters.

## Q1 "When is this necessary? (Aka Shouldn't pure randomization get "close enough"?)"

Stratification is really only done for two reasons: 1) to control for known sources of variance (in your example, gender); 2) to force coverage.

You're already thinking about 1) in your post. For 2), say you have a population where 5% of individuals have some characteristic, and 95% do not. If you want to do a comparison between them, that 5/95 split (which you'll get via pure randomisation) will probably not give you enough power to determine anything, so you could use stratification to force sample into the rare population. A different example of forcing coverage is where you're contractually obligated to have a certain amount of sample in different regions.

## Q2 "Is it wrong to condition on a variable where stratification was not done?"

No issue with this.

## Q3 "What about continuous variables, is there an agreed upon approach?"

Regarding continuous variables, it really depends on the question you want to answer. You'd have to bin them in some way, and the most appropriate way would probably by determined by your theoretical understanding, or past experimental results. And also the research question of interest. But your idea sounds like a good start.

I don't really have anything to say about the ANCOVA question. Maybe someone else has something to contribute.