Solved – What actually is a standardized test statistic

hypothesis testing

So in my statistics class, I've memorized the formula for a standardized test statistic: (sample statistic – hypothesized statistic) / SE. What I fail to understand is what the standardized test statistic is on a conceptual level, and I haven't been able to find a satisfactory explanation.

At first, I thought it was like a standard score – its formula mirrors the z-score formula, as does its name, and I thought my textbook introduced it as such. But if it was like a standard score, I assume it would lie wherever the test statistic lay (correct me if I'm wrong), which would mean it would be very little different from the test statistic and tell us nothing new. Instead, it seems to function like a critical value, separating the tail of a hypothesis test's curve from the rest of the area under the curve. I can't tell if this means the standardized test statistic is the same thing as the critical value z_0 that separates the rejection region from the nonrejection region. That would make sense, because tails seem to be the same thing as rejection regions and the formula for the standardized test statistic, mirroring as I mentioned it does the z-score formula, would be a good way to find a critical value. But then why did my textbook explain the two completely separately?

I feel like the answer may be obvious – maybe it really is as simple as "z_0 = standardized test statistic" and I just didn't read the textbook closely enough. But I would love some input explaining this concept. Thanks!

Best Answer

Interesting question - it shows you are trying to understand a tricky statistical concept at a deeper level.

First of all, your definition of a standardized test statistic is a bit off. The correct definition should be:

standardized test statistic: (sample statistic - hypothesized parameter) / SE. 

As an example, let's say that you are interested in testing the following hypotheses for a population of students:

Ho: The average weight of the students in the population is 60kg;
Ha: The average weight of the students in the population is either smaller than 60kg or larger than 60kg.

The student population might consist of all undergraduate students in your local city.

To test these hypotheses, you are planning to select 100 students at random from the student population.

The sample statistic will be the average weight of the 100 students in your sample.

The hypothesized parameter will be 60kg (i.e., the value of the average weight in the student population hypothesized under the null hypothesis Ho.)

Using the weight data you are going to collect from the 100 students in your sample, you can determine whether you have enough evidence in the data to reject the null hypothesis Ho in favour of the alternative hypothesis Ha.

Let's say your actual sample yields an average weight for the 100 students it contains of 70kg. Would you consider the difference between the average weight in the student sample (i.e., 70kg) to be "large enough" compared to the hypothesized average weight in the student population (i.e., 60kg) that you would reject Ho in favour of Ha with a high degree of confidence?

That is where the standard error (SE) comes into play - think of the SE as the yardstick for deciding when the difference between the sample average weight and population average weight is "large enough" for you to reject H0 in favour of Ha. The SE gives you the context you need to judge whether the discrepancy between the sample statistic and the hypothesized value of the population parameter under the null hypothesis is "large enough".

How does the SE gets calculated for the present example? Imagine that you would draw all possible samples of 100 students, at random, from your target student population. Furthermore, imagine that, for each of these samples you would be able to compute the difference between sample average weight and population average weight hypothesized under the null. Maybe the first sample would give you a difference of 12kg (= 72kg - 60kg), the second sample would give you a difference of 4kg (= 64kg - 60kg), the third sample would give you a difference of -2kg (= 58kg - 60kg), etc. The standard error would be computed as simply the standard deviation of all these differences and would give you a sense of how variable these differences are from sample to sample. The larger the SE, the more variable these differences would be from sample to sample (i.e., the less similar in value). Conversely, the smaller the SE, the less variable these differences would be from sample to sample (i.e., the more similar in value). Thus, the SE is a measure of spread of these differences - the larger the SE, the more spread out the differences. In practice, the SE is computed based on theory rather than doing a thought experiment as described here.

Let's say your SE comes out to be 2kg. Then the original difference of 70kg - 60kg = 10kg is 5 times bigger than the SE, which would suggest that you can reject the null hypothesis H0 in favour of Ha, since the discrepancy between 70kg and 60kg is "large enough" (i.e., larger by a multiplicative factor of 5 relative to the standard error of the test statistic).

Related Question