Solved – Why is random assignment important in stratified sampling

random allocationsamplingstratification

Background

I raised this question because of an argument I am having with a question from user697473 here. The title of his question is "Formal definiton of random assignment." In the post he makes a claim that one can get unbiased estimates of the difference of treatment effect even if one group that is represented in the population is not sampled. At least that is the way I interpret the claim. He points to a paper coauthored by Don Rubin on causal modeling here. Both he and StasK argue through examples that this claim is true. This issue was also raised in another question that he posted here. In that post titled "Random assignment: why bother?" gung expresses skeptism.

In the first question my answer included a statement to the effect that without sampling all possible assignments of a potential confounding factor you cannot construct an estimate of the difference in treatment effect without assigning samples to each stratum for the confounding factor.

The Issue

I am confused about the claim. As I interpret the claim I am sure it is false. The claim is supported by two examples one given by user697473 and the other by StasK. The examples are confusing to me and in neither case is a proof or demonstration given to show that the claim is true.

I could be misinterpreting the claim but based on my interpetation it is false. To illustrate my point let's look at a famous example from the design of experiments book of the Godfather of randomization Sir Ronald Aylmer Fisher: "The Lady Tasting Tea".

In case you don't know the example I will describe it and provide a modification to clarify the example. The lady claims that she has the ability to determine just by tasting a cup of tea whether the tea was poured first or the milk was poured first. Fisher poses an experiment to test her claim by providing some cups with tea poured first and others with the milk poured first. If the Lady has the expertise that she claims she should be able to correctly identify the types better than a person who guesses at random. So the experiment is design to see if her probability of correct classification is better than 0.5. So he randomly provides the Lady with cups from both groups (in the example there were 6 cups 3 with milk poured first and 3 with tea poured first.

Now let me modify the experiment slightly. Suppose I have three brands of tea and for my population these brands are served at a tea parties equally. The question I want to ask is when given a cup of tea at a tea party can the Lady predict whether milk was poured first or the tea was poured first. I want to test whether her prediction accuracy is greater than 0.5 in the setting of tea parties. So I will apply Fisher's experiment to estimate this probability when the Lady is at one of these tea parties.
Note: I do not care whether or not she can differentiate the brands.

Suppose that in the population of tea parties the Lady's ability is:

Brand A Can predict tea first with probability 0.7 and can predict milk first with probability 0.7.

Brand B Can predict tea first with probability 0.75 and can predict milk first with probability 0.75.

Brand C Can predict tea first with probability 0.5 and can predict milk first with probability 0.6.

So she does no better than chance with Brand C but can do better than chance with Brands A and B.

For the experiment I only provide Brands A and B for her to taste. All I want to know is her ability to predict correctly in the normal tea party situation.

My Questions

  1. Can I get an unbiased estimates of her prediction capabilities with this experiment?

  2. If the answer to (1) is no, is user697473's claim false or have I misinterpreted it?

Randomization is used to avoid bias that is produced by confounding variables. This is important when trying to draw inference from a sample to a population. In my example the Brands are confounders. The treatment is tasting the cups. In the population she would get Brand A, Brand B and Brand C each 1/3rd of the time. If I know this and I sample Brands with unequal probabilities I claim that I can get an unbiased estimate of her prediction capabilities by taking a specific weighted average of the estimates of the proportions if I use a stratified random sample with strata totals n$_1$, n$_2$ and n$_3$ all greater than 0. But I cannot if n$_3$ = 0. Furthermore, there is not other estimate I can calculate from a sample where allocation to Brand C equal 0 that will be an unbiased estimate.

Best Answer

You have not correctly interpreted user697473's claim. He is not talking about failing to include any data from brand C. He was talking about giving a particular vector of assignemnts $0$ probability. He was not saying that you can magically determine the value of some variable while never testing it. He wants to be able to use a balanced random subset, so that each point is included in the random subset with the right probability, but not a uniformly random one.

For example, if the set is $\lbrace x_1,x_2,x_3,x_4 \rbrace$, then the following random subsets of uniform size $2$ all have the property that the probability that $x_i$ is included is $1/2$:

$S_1 = 1/6\lbrace x_1,x_2\rbrace + 1/6\lbrace x_1,x_3\rbrace + 1/6\lbrace x_1,x_4\rbrace + 1/6\lbrace x_2,x_3\rbrace + 1/6\lbrace x_2,x_4\rbrace + 1/6\lbrace x_3,x_4\rbrace $

$S_2 = 1/4\lbrace x_1,x_3\rbrace + 1/4\lbrace x_1,x_4\rbrace + 1/4\lbrace x_2,x_3\rbrace + 1/4\lbrace x_2,x_4\rbrace $

$S_3 = 1/2\lbrace x_1,x_2\rbrace + 1/2\lbrace x_3,x_4\rbrace $

These are all balanced in the sense that if you compute the average value of some function $f$ over the random set, the expected value is $1/4(f(x_1) + f(x_2) + f(x_3)+f(x_4))$. In the third random subset, the probability of the subset $\lbrace x_1,x_3 \rbrace$ is $0$.

That said, the point of the experiment should not be to produce an unbiased estimate. That is just one consideration. Another goal is to provide useful information. If you know that you may want to estimate $f$ on a subset $T$ (say $\lbrace x_1,x_2 \rbrace$) and its complement and to subtract the two, the quality of your estimate depends on $\#(T \cap S)$ and $\#(T^c \cap S)$. Then not all balanced subsets have the same quality. For that task, $S_3$ is worse than random assignment ($S_1$), while $S_2$ is better than random assignment.