Solved – What actually is a standardized test statistic

hypothesis testing

So in my statistics class, I've memorized the formula for a standardized test statistic: (sample statistic – hypothesized statistic) / SE. What I fail to understand is what the standardized test statistic is on a conceptual level, and I haven't been able to find a satisfactory explanation.

At first, I thought it was like a standard score – its formula mirrors the z-score formula, as does its name, and I thought my textbook introduced it as such. But if it was like a standard score, I assume it would lie wherever the test statistic lay (correct me if I'm wrong), which would mean it would be very little different from the test statistic and tell us nothing new. Instead, it seems to function like a critical value, separating the tail of a hypothesis test's curve from the rest of the area under the curve. I can't tell if this means the standardized test statistic is the same thing as the critical value z_0 that separates the rejection region from the nonrejection region. That would make sense, because tails seem to be the same thing as rejection regions and the formula for the standardized test statistic, mirroring as I mentioned it does the z-score formula, would be a good way to find a critical value. But then why did my textbook explain the two completely separately?

I feel like the answer may be obvious – maybe it really is as simple as "z_0 = standardized test statistic" and I just didn't read the textbook closely enough. But I would love some input explaining this concept. Thanks!

Best Answer

Interesting question - it shows you are trying to understand a tricky statistical concept at a deeper level.

First of all, your definition of a standardized test statistic is a bit off. The correct definition should be:

standardized test statistic: (sample statistic - hypothesized parameter) / SE.

As an example, let's say that you are interested in testing the following hypotheses for a population of students:

Ho: The average weight of the students in the population is 60kg;
Ha: The average weight of the students in the population is either smaller than 60kg or larger than 60kg.

The student population might consist of all undergraduate students in your local city.

To test these hypotheses, you are planning to select 100 students at random from the student population.

The sample statistic will be the average weight of the 100 students in your sample.

The hypothesized parameter will be 60kg (i.e., the value of the average weight in the student population hypothesized under the null hypothesis Ho.)

Using the weight data you are going to collect from the 100 students in your sample, you can determine whether you have enough evidence in the data to reject the null hypothesis Ho in favour of the alternative hypothesis Ha.

Let's say your actual sample yields an average weight for the 100 students it contains of 70kg. Would you consider the difference between the average weight in the student sample (i.e., 70kg) to be "large enough" compared to the hypothesized average weight in the student population (i.e., 60kg) that you would reject Ho in favour of Ha with a high degree of confidence?

That is where the standard error (SE) comes into play - think of the SE as the yardstick for deciding when the difference between the sample average weight and population average weight is "large enough" for you to reject H0 in favour of Ha. The SE gives you the context you need to judge whether the discrepancy between the sample statistic and the hypothesized value of the population parameter under the null hypothesis is "large enough".

How does the SE gets calculated for the present example? Imagine that you would draw all possible samples of 100 students, at random, from your target student population. Furthermore, imagine that, for each of these samples you would be able to compute the difference between sample average weight and population average weight hypothesized under the null. Maybe the first sample would give you a difference of 12kg (= 72kg - 60kg), the second sample would give you a difference of 4kg (= 64kg - 60kg), the third sample would give you a difference of -2kg (= 58kg - 60kg), etc. The standard error would be computed as simply the standard deviation of all these differences and would give you a sense of how variable these differences are from sample to sample. The larger the SE, the more variable these differences would be from sample to sample (i.e., the less similar in value). Conversely, the smaller the SE, the less variable these differences would be from sample to sample (i.e., the more similar in value). Thus, the SE is a measure of spread of these differences - the larger the SE, the more spread out the differences. In practice, the SE is computed based on theory rather than doing a thought experiment as described here.

Let's say your SE comes out to be 2kg. Then the original difference of 70kg - 60kg = 10kg is 5 times bigger than the SE, which would suggest that you can reject the null hypothesis H0 in favour of Ha, since the discrepancy between 70kg and 60kg is "large enough" (i.e., larger by a multiplicative factor of 5 relative to the standard error of the test statistic).

Related Solutions

Solved – Confused about region of rejection vs P-value.

The significance level is the probability of getting a result in the rejection region, given the null hypothesis is true.

Note that the alternative puts an ordering on your test statistic - the values of the test statistic most in keeping with the alternative are the ones you want in your rejection region.

The p-value is the probability of a test statistic at least as extreme (under that ordering just mentioned) as the one from your sample, if the null hypothesis is true. If the test statistic is in the rejection region, the p-value is smaller than the significance level.

That sounds alot like what a p-value is?

Not really, the rejection region is the set of points where the null would be rejected. The p-value is as stated above, a probability -- and not even the probability associated with that set of points (again, that's the significance level).

Maybe I am confused about what exactly a p-value is.

Maybe. Just check your definitions.

Solved – No Difference for t-test with Standardized Values

This is what should be happening. When you standardize (z-score) the flower heights, you are just doing a linear transformation of the data.

Imagine I had Group 1 with scores 8, 10, and 12 and Group 2 with scores 4, 6, and 8. The means for Group 1 and Group 2 are 10 and 6, respectively. So the mean difference you would be testing with a t-test is 4 (i.e., 10 - 6).

But what would happen if you subtracted 4 from every score? Group 1 and Group 2 means are now 6 and 2, respectively, but the mean difference you would be testing is the same: 4. So the t-test would show the exact same result.

Now, when you z-score (standardize) a variable, all you are doing is taking the raw score, subtracting the mean of the raw scores from it, and then dividing by the standard deviation of the raw scores. Just like above, the mean difference is preserved. You know you are doing a linear transformation because the correlation between raw and standardized scores is 1. Here's some R code showing that:

set.seed(1838) # Setting seed for replicability
raw <- rnorm(100,mean=10,sd=4) # Creating raw scores
z <- scale(raw) # Standardizing scores
cor(raw,z) # Looking at correlation between the two
plot(raw,z) # Plotting values

The correlation is 1, and the plot looks like this:

Best Answer

Related Solutions

Solved – Confused about region of rejection vs P-value.

Solved – No Difference for t-test with Standardized Values

Related Question