Solved – Understanding warning message “Ties are present” in Kruskal-Wallis post hoc

rties

I'm running post-hoc comparisons after a Kruskal-Wallis test. I'm using the PMCMR package.

> posthoc.kruskal.nemenyi.test( preference ~ instrument)

    Pairwise comparisons using Tukey and Kramer (Nemenyi) test  
                   with Tukey-Dist approximation for independent samples 

data:  preference by instrument 

       Cello Drums Guitar
Drums  0.157 -     -     
Guitar 0.400 0.953 -     
Harp   0.013 0.783 0.458 

P value adjustment method: none 

Warning message:
In posthoc.kruskal.nemenyi.test.default(c(50L, 50L, 50L, 50L, 49L,  :
  Ties are present, p-values are not corrected.

I'm confused by the warning message. Can anyone explain what it means and how I can correct it?

Best Answer

A tie means that you have several observations share the same value (hence the same rank). For example, a sample consists of observations: $1, 3, 3, 5, 10, 10, 10$. "$3$" and "$10$" are two ties, where $3$ has replicates of $2$ and $10$ has replicates of $3$. Such a sample corresponds to the rank statistics: $1, 2, 2, 4, 5, 5, 5$.

When ties are present, usually we need to break it (if not, you probably will get the warning message as you showed). And conventionally, we break the ties in rank statistics, in contrast to break ties in the original observations. Since Kruskal-Wallis test is using rank statistics, it is sufficient to answer your question by restricting the scope to the rank statistics.

Two tie-breaking methods are common, one is "breaking ties by random". Namely, we regenerate distinct ranks randomly among the ties. Continuing the above example, to the tie "$2, 2$", we may draw two numbers without replacement from the set $\{2, 3\}$, then assign them to the second and third positions, for example, "3, 2". Similarly, we can do that for the tie $10$. A possible adjusted rank statistics can be $1, 3, 2, 4, 6, 5, 7$, hence the ties got broken. The disadvantage of this method is that you may get different test statistics among different analysis, since the tie-breaking is by random.

The second method is "averaging". That is, average assigns each tied element the "average" rank. Using this method, the original rank statistics becomes: $1, 2.5, 2.5, 4, 6, 6, 6$. This method essentially adjusts the ties instead of breaking them.

In software, you may specify tie-breaking options for which you should consult the function documentation.

For a similar discussion on this issue, see How does ties.method argument of R's rank function work?

Related Question