Chi-Squared Test – Understanding Relationship with Test of Equal Proportions

chi-squared-testcontingency tablesproportion;z-test

Suppose that I have three populations with four, mutually exclusive characteristics. I take random samples from each population and construct a crosstab or frequency table for the characteristics that I am measuring. Am I correct in saying that:

  1. If I wanted to test whether there is any relationship between the populations and the characteristics (e.g. whether one population has a higher frequency of one of the characteristics), I should run a chi-squared test and see whether the result is significant.

  2. If the chi-squared test is significant, it only shows me that there is some relationship between the populations and characteristics, but not how they are related.

  3. Furthermore, not all of the characteristics need to be related to the population. For example, if the different populations have significantly different distributions of characteristics A and B, but not of C and D, then the chi-squared test may still come back as significant.

  4. If I wanted to measure whether or not a specific characteristic is affected by the population, then I can run a test for equal proportions (I have seen this called a z-test, or as prop.test() in R) on just that characteristic.

In other words, is it appropriate to use the prop.test() to more accurately determine the nature of a relationship between two sets of categories when the chi-squared test says that there is a significant relationship?

Best Answer

Very short answer:

The chi-Squared test (chisq.test() in R) compares the observed frequencies in each category of a contingency table with the expected frequencies (computed as the product of the marginal frequencies). It is used to determine whether the deviations between the observed and the expected counts are too large to be attributed to chance. Departure from independence is easily checked by inspecting residuals (try ?mosaicplot or ?assocplot, but also look at the vcd package). Use fisher.test() for an exact test (relying on the hypergeometric distribution).

The prop.test() function in R allows to test whether proportions are comparable between groups or does not differ from theoretical probabilities. It is referred to as a $z$-test because the test statistic looks like this:

$$ z=\frac{(f_1-f_2)}{\sqrt{\hat p \left(1-\hat p \right) \left(\frac{1}{n_1}+\frac{1}{n_2}\right)}} $$

where $\hat p=(p_1+p_2)/(n_1+n_2)$, and the indices $(1,2)$ refer to the first and second line of your table. In a two-way contingency table where $H_0:\; p_1=p_2$, this should yield comparable results to the ordinary $\chi^2$ test:

> tab <- matrix(c(100, 80, 20, 10), ncol = 2)
> chisq.test(tab)

    Pearson's Chi-squared test with Yates' continuity correction

data:  tab 
X-squared = 0.8823, df = 1, p-value = 0.3476

> prop.test(tab)

    2-sample test for equality of proportions with continuity correction

data:  tab 
X-squared = 0.8823, df = 1, p-value = 0.3476
alternative hypothesis: two.sided 
95 percent confidence interval:
 -0.15834617  0.04723506 
sample estimates:
   prop 1    prop 2 
0.8333333 0.8888889 

For analysis of discrete data with R, I highly recommend R (and S-PLUS) Manual to Accompany Agresti’s Categorical Data Analysis (2002), from Laura Thompson.

Related Question