# Hypothesis Testing – How to Statistically Compare Two Large Continuous Datasets

hypothesis testingnumpyscipystatistical significancet-test

I have multiple images with 1 band/channel. This of an RGB image, then I only have the blue band/channel. In other words, multiple 1D datasets, or multiple 1D arrays.

I would like to statistically compare each pair of images where a pair means two successive images.

Each image contains about 50,000 pixels or values. This means that one can have 50,345 values and the other 50,433 values, so the number of values is not dramatically different but it is not always the same, so any method that is based on equal arrays will not be adequate here. This also means that the pixel in coordinates, x,y in image (or array) #1 does not have to correspond to the pixel in the same location in image #2.

Let's take these two examples (where each color corresponds to different image or array of values):

Put in a non-statistically way, the blue and the red are similar, while the red and the green are different.

I would like to perform a statistical test that will quantify this difference and then I can choose
a threshold and decide accordingly if these are similar enough for my application or not.

My question is – which statistical test or model or method is adequate for that assuming the distribution is similar to what you see in the examples, meaning that the distribution is not 100% Gaussian.

The t-test and z-test do not work here because the degree of freedom is huge hence the p-value is 0, see for example one (of many) t-tests variations I did:

stats.ttest_rel(img1,img2,nan_policy='omit')
>>> Ttest_relResult(statistic=-90.27773456178737, pvalue=0.0)

stats.ttest_ind(img1,img3,nan_policy='omit',equal_var=False)
>>> Ttest_indResult(statistic=360.2704559875767, pvalue=0.0)


I thought maybe to try to calculate the distance between the datasets or to calculate the overlapping histogram area between two datasets (because it seems better than comparing the mean) but I'm not sure which method (preferably in Python) is adequate for such a task.

At the moment, I can't quantify or define "similarity" for my application. I will be able to do that once I have a number that will quantify the similarity and then I'll have to check more examples and see which threshold is ok for me. So I do not need an answer to the similar/not similar question, rather I would like to get an answer as to how to quantify this similarity. My final goal (which is not the question here) – is to get a true/false result. i.e., are these datasets similar (true) or not (false), based on a value that will quantify the similarity (this is my question).

I know my question is a bit like shooting in the dark but this is because I am not sure which way to go- should I compare the means? the variance? the area of the histograms.

One last thing: I would like to be able to automate the solution since I have many of these paired datasets, so visual inspection will not work here.