Solved – Statistical test for a value being significantly further from the population mean: is it a Z-test or a T-test

hypothesis testingstatistical significance

How significant is a value compared to a list of values? In most cases statistical testing involves comparing a sample set to a population. In my case the sample is made by one value and we compare it to the population.

I am a dilettante in statistical hypothesis testing confronted with perhaps the most basic problem. It is not just one test but hundreds of them. I have a parameter space, and must do a significance test for every point. Both value and background list (population) are generated for each parameter combination. Then I am ordering this by p-value and find interesting parameter combinations. In fact, the finding of parameter combinations where this p-val is high (nonsignificance) is also important.

So let's take one single test: I have a computed value generated from a selected set and a background set of values computed by choosing a random training set. The computed value is 0.35 and the background set is (probably?) normally distributed with a mean of 0.25 and a very narrow std (e-7). I actually don't have knowledge on the distribution, because the samples are computed from something else, they are not random numbers samples from some distribution, so background is the correct word for it.

The null hypothesis would be that "the mean of the sample test equals my computed value, of 0.35". When should I consider this to be a Z-test or a T-test? I want the value to be significantly higher than the population mean, therefore it is a single-tailed test.

I am a bit confused as to what to consider as a sample: I either have a sample of one (the observation) and the background list as the population OR my sample is the background list and I am comparing that to the whole (unsampled) population which according to the null hypothesis should have the same mean. Once this is decided, the test goes to different directions I guess.

If it is a T-test, how do I compute its p-value? I would like to compute it myself rather than using an R/Python/Excel function (I already know how to do that) therefore I must establish the correct formula first.

To begin with, I suspect a T-test is a bit too general, since in my case the T-test would be linked to the sample size and would have the form: $$T=Z/s,$$ where $$Z=\frac{\bar{X}}{\frac{\sigma}{\sqrt{n}}}$$ and s is $$s=\hat{\sigma}/\sigma$$, the sample std versus the population std. So I have two cases: either my sample size is the size of the population, which I "guess" would mean I am dealing with a Z-test, or the population statistics (n and std) are unknown but the distribution can be in some way approximated and I am really dealing with a T-test. In any case my following questions are:

How do I compute a p-value? (i.e. not using an R/Python/Excel function or p-value table look-up but actually compute it based on a formula, because I want to know what I am doing)
How do I decide a significance threshold based on my sample size? (a formula would be nice)

Best Answer

You raise an interesting question. First thing first, if you have an observation of 0.35, a mean of 0.25, and a standard deviation of 1/10^7 (that's how I interpret your e^-7 bit) you really don't need to go into any hypothesis testing exercise. Your 0.35 observation is very different than the mean of 0.25 given that it will be several thousands standard deviation away from the mean and it will probably be several millions standard errors from the mean.

The difference between the Z-test and the t-test refers mainly to sample size. With samples smaller than 120, you should use the t-test to calculate p values. When sample sizes are greater than that, it does not make much difference if at all which one you use. It is fun to calculate it both ways regardless of sample size and observe how little difference there is between the two tests.

As far as calculating things yourself, you can calculate the t stat by dividing the difference between your observation and the mean and divide that by the standard error. The standard error is the standard deviation divided by the square root of the sample size. Now, you have your t stat. To calculate a p value I think there is no alternative than to look up your t value within a t test table. If you accept a simple Excel alternative TDIST(t stat value, DF, 1 or 2 for 1 or 2 tail p value) does the trick. To calculate a p value using Z, the Excel formula for a 1 tail test is: (1 - NORMSDIST (Z value). The Z value being the same as the t stat (or the number of standard error away from the mean).

Just as a caveat, those methods of hypothesis testing can get distorted by sample size. In other words, the larger your sample size the smaller your standard error, the higher your resulting Z value or t stat, the lower the p value, and the higher your statistical significance. As a short cut in this logic, large sample sizes will result in high statistical significance. But, high statistical significance in association with large sample size can be completely immaterial. In other words, statistically significant is a mathematical phrase. It does not necessarily mean significant (per Webster dictionary).

To get away from this large sample size trap, statisticians have moved on to Effect Size methods. The latter use as a unit of statistical distance between two observations the Standard Deviation instead of the Standard Error. With such a framework sample size will have no impact on your statistical significance. Using Effect Size will also tend to move you away from p values and towards Confidence Intervals which can be more meaningful in plain English.

Best Answer

Related Solutions

Statistical Significance for Comparing Two Classifiers – P-Value, ROC AUC, Sensitivity and Specificity

Question 1 - AUC

Question 1 - Sensitivity and Specificity

Question 2

Related Question