Solved – a good way of testing for a relationship between two count variables

correlationcount-datapoisson-regressionstring-count

I have counts of occurrences of two types of words (A and B) in several texts. What I would like to test is whether the frequencies of occurrence of both types of words across texts is 'correlated'. However, using Pearson's correlation is probably not correct, because my data is not continuous, and in addition the counts are often quite low (sometimes zero).
What is a good way to test my hypothesis?

Best Answer

@Mattthew has answered your question: Spearman's $\rho$ will give you a measure of monotonic association between your variables. You can also perform inference on whether this correlation is, for example, different than zero using a straightforward t test.

To calculate $\boldsymbol{r}_{\textbf{S}}$ (assuming no ties):

  • Rank each of your variables independently.
  • Calculate the difference, $d_{i}$, between ranks for each observation/text (I am assuming from your question, that the measures are paired: so there's a count from text $A$ and a different count from text $B$, across n texts).
  • $r_{\text{S}} = 1 - \frac{6\sum_{i=1}^{n}{d_{i}^{2}}}{n\left(n^{2}-1\right)}$

The calculation for $\mathbf{r}_{\textbf{S}}$ (regardless of ties):

  • Rank each of your variables independently.

  • Calculations proceed as for Pearson's $r$ but using the ranked values ($r_A$ and $r_B$) of the before and after (or matched) observations:

    $r_{\text{S}} = \frac{\sum_{i=1}^{n}{\frac{r_{A,i} - \overline{r}_{A}}{s_{r_A}} \times \frac{r_{A,i} - \overline{r}_{A}}{s_{r_B}}}}{n-1}$

To test for evidence $\mathbf{r_{\textbf{S}} \ne 0}$:

  • $\text{H}_{0}\text{: }r_{\text{S}} = 0$, $\text{H}_{\text{A}}\text{: }r_{\text{S}} \ne 0$

  • $t = r_{\text{S}}\sqrt{\frac{n-2}{1-r^{2}_{\text{S}}}}$

  • Base your rejection decision for $\text{H}_{0}$ on the t distribution, with $n-2$ degrees of freedom.

Pagano, M., & Gauvreau, K. (2000). Principles of Biostatistics (2nd ed.). Duxbury Press.

Related Question