Solved – Understanding warning message “Ties are present” in Kruskal-Wallis post hoc

rties

I'm running post-hoc comparisons after a Kruskal-Wallis test. I'm using the PMCMR package.

> posthoc.kruskal.nemenyi.test( preference ~ instrument)

    Pairwise comparisons using Tukey and Kramer (Nemenyi) test  
                   with Tukey-Dist approximation for independent samples 

data:  preference by instrument 

       Cello Drums Guitar
Drums  0.157 -     -     
Guitar 0.400 0.953 -     
Harp   0.013 0.783 0.458 

P value adjustment method: none 

Warning message:
In posthoc.kruskal.nemenyi.test.default(c(50L, 50L, 50L, 50L, 49L,  :
  Ties are present, p-values are not corrected.

I'm confused by the warning message. Can anyone explain what it means and how I can correct it?

Best Answer

A tie means that you have several observations share the same value (hence the same rank). For example, a sample consists of observations: $1, 3, 3, 5, 10, 10, 10$. "$3$" and "$10$" are two ties, where $3$ has replicates of $2$ and $10$ has replicates of $3$. Such a sample corresponds to the rank statistics: $1, 2, 2, 4, 5, 5, 5$.

When ties are present, usually we need to break it (if not, you probably will get the warning message as you showed). And conventionally, we break the ties in rank statistics, in contrast to break ties in the original observations. Since Kruskal-Wallis test is using rank statistics, it is sufficient to answer your question by restricting the scope to the rank statistics.

Two tie-breaking methods are common, one is "breaking ties by random". Namely, we regenerate distinct ranks randomly among the ties. Continuing the above example, to the tie "$2, 2$", we may draw two numbers without replacement from the set $\{2, 3\}$, then assign them to the second and third positions, for example, "3, 2". Similarly, we can do that for the tie $10$. A possible adjusted rank statistics can be $1, 3, 2, 4, 6, 5, 7$, hence the ties got broken. The disadvantage of this method is that you may get different test statistics among different analysis, since the tie-breaking is by random.

The second method is "averaging". That is, average assigns each tied element the "average" rank. Using this method, the original rank statistics becomes: $1, 2.5, 2.5, 4, 6, 6, 6$. This method essentially adjusts the ties instead of breaking them.

In software, you may specify tie-breaking options for which you should consult the function documentation.

For a similar discussion on this issue, see How does ties.method argument of R's rank function work?

Related Solutions

Solved – Addressing “NOTE: Results may be misleading due to involvement in interactions” warning with Tukey post-hoc comparisons in lsmeans R package

My view is that the $F$ test of statistical significance of the interaction effect is less important than the subjective nature of the interaction, as evidenced by the plot. The plot tells me that it is reasonably sensible to compare the overall averages of Depression and Top, but it'd be silly to compare those averages with the overall average of Slope -- whether or not these comparisons are statistically significant. Basically, I'd say to avoid doing comparisons that don't make sense -- so my advice is do not ignore the warning note in this case. If the curve for Top were fairly parallel with the other two, that's when you could ignore it.

In general, I suggest looking at enough plots that you can tell what's going on, and then restrict your post-hoc testing to things that are sensible.

Since P is continuous, you're really fitting straight lines (they look curved because you chose unequally spaced points). You can compare the slopes of these lines:

R> lstrends(Dens.LMER, pairwise ~ Contour, var = "P")

$lstrends
 Contour        P.trend          SE    df    lower.CL     upper.CL
 Depression -0.00681143 0.004901195 39.68 -0.01671957  0.003096714
 Slope      -0.03376293 0.010533875 41.88 -0.05502295 -0.012502911
 Top        -0.01306992 0.010499548 41.97 -0.03425936  0.008119525

Confidence level used: 0.95 

$contrasts
 contrast               estimate         SE    df t.ratio p.value
 Depression - Slope  0.026951501 0.01161827 42.00   2.320  0.0639
 Depression - Top    0.006258486 0.01158716 41.81   0.540  0.8520
 Slope - Top        -0.020693015 0.01487290 41.99  -1.391  0.3545

P value adjustment: tukey method for a family of 3 tests

The comparison between the shallowest and largest slopes has an adjusted $P$ value of about $.06$.

Solved – Kruskal-Wallis and post-hoc analysis in R

heteroskedasticity seems to be the thing you're most worried about -- why go to Kruskal Wallis rather than just a Welch adjustment? However it happens that your standard deviations are almost constant (3.9, 4.8, 4.7). Why would that very modest amount of change in spread by of concern?
a rejection of the omnibus null doesn't necessarily imply any of the individual comparisons will be significant.
formal hypothesis tests of assumptions aren't necessarily useful -- we don't necessarily believe any of the assumptions are exactly true, what matters is their impact on your inference, which a p-value in a hypothesis test really doesn't tell you. (You might easily reject the null of constant variance, but if the standard deviations by group don't change by a substantial amount (possibly by a good deal more than you can detect by a test, depending on sample size), it may hardly matter. On the other hand, failure to reject in small samples should be no consolation at all.

Best Answer

Related Solutions

Solved – Addressing “NOTE: Results may be misleading due to involvement in interactions” warning with Tukey post-hoc comparisons in lsmeans R package

Solved – Kruskal-Wallis and post-hoc analysis in R

Related Question