[Math] Null hypothesis test for independent but not identically distributed samples

pr.probabilityst.statistics

I'm trying to figure out the best statistical test to use for an edge case I've run into: trying to figure out the likelihood of the null hypothesis for a set of samples that each (potentially) come from a different, known distribution.

These distributions can be thought of as normal distributions generated from convolving discrete uniform distributions of various sizes. In general, these should be normal-shaped, but in the worst case a sample can be generated from a pure discrete-uniform distribution. However, in all cases, the shape and CDF of the distribution for the null-hypothesis is known.

I would be interested in any of the following hypothesis tests:

  1. Test that works based on a set of probabilities from the CDF of various distributions
  2. Test that works based on samples drawn from distributions of known variance (few assumptions made on the distribution shape, could be uniform, normal, etc)
  3. Test that works based on samples drawn from normal distributions of different variance

I'd consider the first two better because they're more robust. The third would be useful also, though I'd need to run a pre-test to determine if the distributions are normal enough to be appropriate.

The major hurdle is that the test can't can't assume that any two samples are from the same distribution. However, the CDF of each distribution is approximately known (so the mean and variance are known). The samples are independently distributed, but not identically distributed. From a practical standpoint, this can be thought of as:

$x_1 \sim \sigma_1^2$

$x_2 \sim \sigma_2^2$

$x_N \sim \sigma_N^2$

I know for a standard bunch of samples ($\overrightarrow{X}$) from a single normal distribution, you can use a z-test or t-test. For two sets of samples with unknown variance ($\overrightarrow{X} \sim \sigma_x^2$ , $\overrightarrow{Y} \sim \sigma_y^2$), the Welch's t-test would do the trick. However, I am not familiar with a test that handles the case that I'm looking at (known structure for null-hypothesis across many distributions that only have one sample drawn from them).

Does anyone know the general way people approach these type of issues? I can calculate the CDF for each individual sample under the null-hypothesis assumption. My issue is the appropriate method to integrate this information into a null hypothesis test, since it's meaningless to test a null hypothesis with 1 sample no matter what test you use.

Does anyone know something like Welsh's test that works appropriately if each population size is 1 and there are large number of populations? Even better would be an approach to use the straight CDF probabilities or one that can handle a mixture of samples drawn from uniform or normal distributions (without throwing out the known structure). I'm also open to throwing some Bayesian statistics at the problem, but would prefer a null-hypothesis formulation because people are more familiar with interpreting those.

Best Answer

If you know the exact distributions, why throw away this precious information?

You do not make it very clear what hypothesis exactly you are trying to test... but the Bayesian way to solve this problem would be to start with a prior for the distribution where each sample is drawn from. Using the likelihood that each distribution would produce each sample, you can compute a posterior probability for each sample to come from each distribution.

I can elaborate if you specify what it is you're trying to test for.