Solved – Distribution of “p-value-like” quantities under null hypothesis

distributionshypothesis testingp-valueuniform distribution

It is well established that p-values are uniformly distributed when the null hypothesis is true. This follows from the definition of a p-value

The probability of observing a value (or more extreme one) when those values are drawn from the known, and fixed distribution (i.e. null is true).

enter image description here

This fact allows for a range of follow up analysis when looking at distribution of pvalues.

http://varianceexplained.org/statistics/interpreting-pvalue-histogram/
examples of looking at p-value histograms.

However, I am concerned with a different probability. Instead of the proportion of observations that are 'more extreme'. I would like to know the proportion of observations that are 'more rare'.

It is true that 'more extreme' implies 'more rare' however, 'more rare' does not imply 'more extreme' — particularly for multimodal distributions under the null as shown in the 2 images below. An observation could be near the mean and still be a rare observation from a low density portion of the null distribution.

regular p-value

d-value description for lack of better term

One sided p-value $$P(X > x | H)$$

For my 'd-values':
$$P(\theta(X) \le \theta(x) | H)$$

For a density function theta (which in my case comes from a simple univarate KDE)

Questions:

  • 1) What are these "d-values" called?
    I can't be the first person to have this question?

  • 2) How are these "d-values" distributed under Ho?

    Let $0 \le \beta \le \max_x(\theta(x))$ (the density of the highest mode)

    $P(\theta(x) \le 0) = 0$

    $P(\theta(x) \le \max_x(\theta(x))) = 1$

    $P(\theta(x) \le \beta) = {}$??

    This is kind of like a vertical integration over density values, but leaving out any density > threshold.

  • 3) Does the distribution of 2 hold no matter what form the
    distribution of observations is under Ho? (It does for p-values ->
    uniform).

Best Answer

Let $f$ be the density of $X$. You are concerned about the distribution of ''d-values'' $$d = P( f(X) < f(x_{obs}))$$ when $x_{obs}$ is drawn in the distribution of $X$.

Let's construct an other random variable by transforming $X$ : $Y = f(X)$, and let $y_{obs} = f(x_{obs})$. Then in fact you're looking at the distribution of $$P(Y < y_{obs})$$ when $y_{obs}$ is drawn in the distribution of $Y$.

It is then the uniform distribution.

A quick numerical experiment

Consider a mixture of two Gaussian with variance 1 and means 0 and 4. Its density looks like enter image description here

Now for the numerical experiment:

# a reference sample to compute d values
X_ref <- c( rnorm(1e4), rnorm(1e4, mean = 4) )

# a set of observations
x_obs <- c( rnorm(1e4), rnorm(1e4, mean = 4) )

# the d-values
d <- sapply(x_obs, function(x) mean(f(x) < f(X_ref)) )

plot(ppoints(2e4), sort(d), pch = ".")

enter image description here

Related Question