Hypothesis Testing – How to Statistically Compare Two Large Continuous Datasets

hypothesis testingnumpyscipystatistical significancet-test

I have multiple images with 1 band/channel. This of an RGB image, then I only have the blue band/channel. In other words, multiple 1D datasets, or multiple 1D arrays.

I would like to statistically compare each pair of images where a pair means two successive images.

Each image contains about 50,000 pixels or values. This means that one can have 50,345 values and the other 50,433 values, so the number of values is not dramatically different but it is not always the same, so any method that is based on equal arrays will not be adequate here. This also means that the pixel in coordinates, x,y in image (or array) #1 does not have to correspond to the pixel in the same location in image #2.

Let's take these two examples (where each color corresponds to different image or array of values):

enter image description here

enter image description here

Put in a non-statistically way, the blue and the red are similar, while the red and the green are different.

I would like to perform a statistical test that will quantify this difference and then I can choose
a threshold and decide accordingly if these are similar enough for my application or not.

My question is – which statistical test or model or method is adequate for that assuming the distribution is similar to what you see in the examples, meaning that the distribution is not 100% Gaussian.

The t-test and z-test do not work here because the degree of freedom is huge hence the p-value is 0, see for example one (of many) t-tests variations I did:

stats.ttest_rel(img1,img2,nan_policy='omit')
>>> Ttest_relResult(statistic=-90.27773456178737, pvalue=0.0)

stats.ttest_ind(img1,img3,nan_policy='omit',equal_var=False)
>>> Ttest_indResult(statistic=360.2704559875767, pvalue=0.0)

I thought maybe to try to calculate the distance between the datasets or to calculate the overlapping histogram area between two datasets (because it seems better than comparing the mean) but I'm not sure which method (preferably in Python) is adequate for such a task.

At the moment, I can't quantify or define "similarity" for my application. I will be able to do that once I have a number that will quantify the similarity and then I'll have to check more examples and see which threshold is ok for me. So I do not need an answer to the similar/not similar question, rather I would like to get an answer as to how to quantify this similarity. My final goal (which is not the question here) – is to get a true/false result. i.e., are these datasets similar (true) or not (false), based on a value that will quantify the similarity (this is my question).

I know my question is a bit like shooting in the dark but this is because I am not sure which way to go- should I compare the means? the variance? the area of the histograms.

One last thing: I would like to be able to automate the solution since I have many of these paired datasets, so visual inspection will not work here.

Best Answer

You could try some sampling techniques. In more detail, you could select smaller random samples from the blue, red and green populations and compare those samples using the traditional statistical tests you mentioned. Run that multiple times and count how many times the null hypothesis (that the means are equal) got rejected out of the total. Keep in mind that p-values are random variables too so at a significance level of 5%, you'd expect 5% of these hypotheses tests to reject the null hypothesis even when the means are the same (so potentially even in the red vs blue case).

Alternatively, another option would be to run the Kolmogorov-Smirnov test.

Related Question