Two sample z test: Applicability

hypothesis testingstatistics

The following example exercise is taken from Statistics by Freedman

A geography test was given to a simple random sample of 250 high-school students
in a certain large school dishict. One question involved an outline map of Europe,
with the counhies identified only by number. The students were asked to pick out
Great Britain and France. As it turned out, 65.6% could find France, compared to
70.4% for Great Britain. 18 Is the difference statistically significant? Or can this be
determined from the information given?

The author says

Exercise 5 on p. 515 (the geography test)
is an example of when not to use the formulas. Each subject makes two responses,
by answering (i) the question on Great Britain, and (ii) the question on France.
Both responses are observed, because each subject answers both questions. And
the responses are correlated, because a geography whiz is likely to be able to
answer both questions correctly, while someone who does not pay attention to
maps is likely to get both of them wrong. By contrast, if you took two independent
samples-asking one group about France and the other about Great Britain-the
formula would be fine. (That would be an inefficient way to do the study.)

The author is talking about two sample z-test. And the formula he is talking about is

$$Var(\bar{X}-\bar{Y})=Var(\bar{X})+Var(\bar{Y})$$

I understand that the variables are co-related, so $CoVar(\bar{X},\bar{Y})$ should also be present in the formula.

What I don't understand is

By contrast, if you took two independent samples-asking one group about France and the other about Great Britain-the formula would be fine. (That would be an inefficient way to do the study.)

  1. Why we don't need to consider covariance in this case and only in case of single sample?

Geography whiz are going to be present in the second independent sample in the approximately same proportion as the first sample.

  1. The author says that example problem can be solved using more advanced mathematics if we have information about the perctanges of the following category

    1 1 found Great Britain and France on the map

    1 0 found Great Britain; could not find France

    0 1 could not find Great Britain; found France

    0 0 could not find either country

I would like to know what this advanced mathematics is.

Thanks.

Best Answer

Here is Minitab output for fake data in such a table. I did not try to match the percentages you give in your problem. The null hypothesis is that recognition of GB and of France are independent abilities. The small p-value indicates the null hypothesis is rejected.

Chi-Square Test for Association: France, GB 

Rows: France   Columns: GB

         Yes     No  All

Yes       43     21   64
       33.58  30.42
       2.640  2.915

No        10     27   37
       19.42  17.58
       4.566  5.042

All       53     48  101

Cell Contents:      Count
                    Expected count
                    Contribution to Chi-square

Pearson Chi-Square = 15.163, 
    DF = 1, P-Value = 0.000

Computations:

The observed count for the upper-left cell is $X_{11} = 43.$

The expected count for the upper-left cell is $E_{11} = 64(53)/101 = 33.58.$

The contribution for that cell is $(X_{11} - E_{11})^2/E_{11} = 2.64.$

The chi-squared statistic $15.163$ is the sum of the 'contributions' from all four cells.

From the row with DF=1 in a printed table of chi-squared distributions, you can see that the value $3.8415$ cuts 5% from the upper tail of the distribution $\mathsf{Chisq}(1),$ so that any value of the chi-squared statistic above 3.8415 would lead you to believe that identification of GB and identification of France are not independent abilities (at the 5% level of significance). The chi-squared statistic here is $15.16 > 3.84.$

Perhaps you can find a more complete discussion of this kind of test later in your text.

Addendum. Suppose my data are real. In these data, the 43 + 27 who got both countries right or neither, you have no info whether GB or France is easier to identify on a map. Of the other 31, who got exactly one country right, there are only 10 who got only GB wrong.

Those 10 are in the lower tail of $\mathsf{Binom}(31, .5).$ That is, assuming both countries are equally easy to identify, there is only probability 0.0354 < 5% that 10 or fewer get only GB wrong. I would hesitate to draw strong conclusions from only 31 useful responses, but there does seem to be evidence more people recognize GB than France on a map. (That wouldn't be surprising, because many people know GB is an island nation, and there aren't many big islands on a map of Europe.)

In R:

pbinom(10, 31, .5)
[1] 0.03537777
Related Question