Solved – How to know if two distributions have the same mean and standard deviation

biostatisticsdistributionsmeasurementt-test

I have some cells which fluorescence and I can detect their fluorescence. So for each cell I get a number N that says how much light it emitted.
I also have two kinds of cells A and B.
I have measured 1000 cells of type A and B each which gives a distribution for each type. I want to know if the mean and standard deviation of these two distributions is the same or not.

In practice (the following numbers are made up the actual measurements contains +25000 cells), I measured 100 A cells 10 times (in total 1000 different cells) and I did similarly for B cells.
Each measurement (that includes 100 cells) gives a sample mean and a standard deviation.
Now I want to see if the mean emitted light and the standard deviation of emitted light by cells A and B is the same or not so I do a t-test.

My question is if I rearranged my data does it affect the t-test?
For example, consider two extremes:
1- I divide my 1000 measurements for each type of cells into 20 groups of 50 cells instead of 10 groups of 100.

2- I only consider one group of 1000 cells.

The mean stays the same but the standard deviation changes based on group size. For example in the second case it is zero!

How should I divide my data into groups for t-test? or alternatively how can I know two distribution have equal mean and standard deviation?

Best Answer

The tl;dr version: I think it would be better to do a two-sample Kolmogorov–Smirnov test on the original (not binned) data.


Extended answer:

A direct reply to your last question is: you can use an F-test to compare variances (the square of the standard deviations). Better versions of this test are the Levene's test or the Bartlett's test.

However, there are other approaches that may be more accurate. In particular, you didn't say anything about the Normality. And the t-test and the F-test assume Normality of the distributions. For such large samples, taking a look at the skewness and kurtosis, together with a Q-Q plot might help to assess Normality.

Nevertheless, comparing means and standard deviations do not guarantee that the distributions are similar -- you may have two distributions with the same mean and standard deviation that, e.g., have different skewness and/or kurtosis. So, to compare distributions, you can use the two-sample Kolmogorov–Smirnov test. Again, a graphical method is also useful to see if the differences (even if significant) are relevant. For instance, you may plot the CDF of both samples to see how large are the differences between them.

Regarding binning the data, I usually advise against it. I think you can compute the mean and standard deviation of $N$ for group A and for group B and these will be indicative of each sample characteristics.

By the way, if you had different type of data, where cells would either be on or off, then the more appropriate test for this dichotomous case would be Fisher's exact test.