Solved – Is Fisher Sharp Null Hypothesis testable

Under the Potential Outcome framework for causal analysis. Let $Y_i(W_i)$ be the potential outcome of subject $i$ if the treatment he received is $W_i\in\{0,1\}$. In reality, we only observe at most one potential outcome of each subject, i.e., if you observe $Y_{i}(1)$, then you cannot observe $Y_i(0)$.

The Fisher Sharp Null Hypothesis is the following:

$H_0: Y_i(0)=Y_i(1),~\forall~i\in\{1,2,…,N\}$.

This means: under $H_0$, the treatment has no effect for ALL subjects. Then the observed outcome in this set of subjects is merely the result of the randomization assignment procedure $\mathbf{W}$ (e.g., the randomization procedure could be each subject flip a fair coin such that head go to treatment and tail go to control group). So we can compute the EXACT distribution of any test statistic that is based on $\mathbf{W}$, then we can check what is the p-value of this observed test statistic based on its own distribution we just derived under $H_0$. And this is the permutation test.

Now here comes my question. In reality, people often use the test statistic as $T=$ average mean (or median or rank) difference between two groups. However, in my eyes, these test statistics are just some aspect of $H_0$, not $H_0$ itself. For example, under $H_0$, one can construct as many test statistic as he wants. And it could be that some of them have small p-value, and some of them have large p-value, then in this case, what should we do? Should we accept $H_0$ or reject it? For scientific research, some guys just check a lot of metrics (test statistics) when using the data, and if they just report those metrics that have small p-value, then it is not very good for the readers.

The $H_0$ is about the whole distribution of the difference between $Y(1)$ and $Y(0)$ (according to Rubin, the potential outcome, vector $Y(1)$ and $Y(0)$ are not random and only the randomization assignment procedure is the source of randomness. So here the word “distribution” means a frequency plot). But all people checked in practice is a partial property of that distribution, e.g., whether the mean or median of the 2 distribution are equal. In statistic 101, we usually facing $H_0$ itself is whether the mean of 2 groups are different. In that case, it makes sense to use difference in group mean as test statistic.

So for this Fisher Sharp $H_0$ should we always use something like $Kolmogorov-Smirnov$ test statistics?

Best Answer

Your $H_{0}$ implies $E[Y_{i}(1) - Y_{i}(0)] = 0$, which is testable whenever we have identified this ATE, by whatever means. By elementary logic, rejecting an implication of a statement rejects this statement. So this is one way to test the Fisher Sharp Null.

However, one may indeed be interested into whether $H_{0}$ holds or whether just ATE = 0 holds. This was actually a debate between Fisher and Neyman. It turns out one way to test this is to identify both the ATE and the ATT/ATU. The trick is to realize that Neyman's hypothesis is about the ATE, which may be zero, but that non-zero individual-level effects could lead to the ATT/ATU being different from zero. E.g., $E[Y_{i}(1) - Y_{i}(0)|T = 1] \neq 0$. This means that when you find the ATT/ATU to be different from zero while the ATE is zero, the Fisher sharp null can be rejected, while the Neyman weak null holds.

In general, the ATT is identified only with observational, not with experimental data. However, Pearl also observed that in the binary treatment case, a combination of observational and experimental data identifies the ATT, without need to rely on extra assumptions. That is, experimental data give you $E[Y_{i}(1)]$ and $E[Y_{i}(0)]$, and observational data give you $E[Y_{i}|T_{i}]$ and $P(T_{i})$. The ATT is then identified as $E[Y_{i}|T_{i} = 1] - \dfrac{E[Y_{i}(0)] - E[Y_{i}|T_{i} = 0](1 - P(T_{i}))}{P(T_{i})}$

The other problem you describe, "some guys just check a lot of metrics", is p-hacking, which is of course a widespread problem, but not really related to causal inference per se.

Source: Pearl, Judea. "Detecting latent heterogeneity." Sociological Methods & Research (2015)

Best Answer

Related Solutions

Solved – t test p value vs randomization-inference p value: What can we learn from comparison

Related Question