Solved – How to do a correlation between Likert scale and an ordinal categorical measure

correlationlikertordinal-data

I have a 5 pt Likert scale and a measure of time on the internet e.g. 1= less than 10 mins, 2=10-30 etc as well as number of SNS contacts (1=10-30, etc). I would like to estimate their correlation but I don't think I could do that directly as such. I computed a mean score for the Likert responses but what about the other two measures??

Best Answer

What about one of the Kendall's $\tau$s? They are a kind of rank correlation coefficient for ordinal data.

Here's an example with Stata and $\tau_{b}$. A value of $−1$ implies perfect negative association, and $+1$ indicates perfect agreement. Zero indicates the absence of association. Here we see a modest, though significant, negative association between speed limits and accidents.

. webuse hiway, clear
(Minnesota Highway Data, 1973)

. tab spdlimit rate, taub

           |    Accident rate per million
     Speed |          vehicle miles
     Limit |   Below 4        4-7    Above 7 |     Total
-----------+---------------------------------+----------
        40 |         1          0          0 |         1 
        45 |         1          1          1 |         3 
        50 |         1          4          2 |         7 
        55 |        10          4          1 |        15 
        60 |         9          2          0 |        11 
        65 |         1          0          0 |         1 
        70 |         1          0          0 |         1 
-----------+---------------------------------+----------
     Total |        24         11          4 |        39 

          Kendall's tau-b =  -0.4026  ASE = 0.116

You can also try an asymmetric modification of $\tau_{b}$ that only corrects for ties of the independent variable. This is called Somer's D:

. somersd rate spdlimit
Somers' D with variable: rate
Transformation: Untransformed
Valid observations: 39

Symmetric 95% CI
------------------------------------------------------------------------------
             |              Jackknife
        rate |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    spdlimit |  -.4727723   .1395719    -3.39   0.001    -.7463282   -.1992163
------------------------------------------------------------------------------

All these measure of association are related in that they classify all pairs of observations (highways in our example) as concordant or discordant. A pair is concordant if the observation with the larger value of variable $X$ (speed limit) also has the larger value of variable $Y$ (accident rate). There are more of them than you can shake a stick at (one more is Goodman and Kruskal's $\gamma$, which ignores ties altogether like $\tau_{a}$). They will generally yield similar conclusions, even if they are not directly comparable.

The results above are qualitatively in line with Spearman's rank correlation coefficient mentioned by Greg (which tends to be larger in absolute value than $\tau$):

.ci2 rate spdlimit, spearman

Confidence interval for Spearman's rank correlation 
of rate and spdlimit, based on Fisher's transformation.
Correlation = -0.451 on 39 observations (95% CI: -0.671 to -0.158)

This measure does not consider pairs, but compares the similarity of the ordering that you would get if you used each variable separately to rank observations (Stata breaks ties by assigning the average rank, and it's just Pearson correlation on the ranks). This makes it somewhat faster to compute since you don't have to consider all $\frac{n \cdot (n-1)}{2}$ pairs. On the other hand, the central limit theorem works much faster for $\tau$, so if you plan to do inference that measure might be better. $\tau_b$ is the most common variant.

Related Question