Solved – Correct interpretation of ANOVA post-hoc results. Which group is the best one

hypothesis testingmultiple-comparisonspost-hoc

Consider the situation in which one should investigate, which treatment or treatments (of A, B, C, D, E, F, G, H) is/are the most effective (highest decrease) and which is/are the worst (lowest decrease, see figure below).

enter image description here

One made an ANOVA/Kruskal-Wallis-like test, which showed a statistically significant difference. Then continued with posthoc pairwise comparisons and summarized the results in the plot below. Non-capital letters a, b, c (the compact letter display, cld) above the box-plots and jittered points of data indicate statistical (in)significance in a concise way: if treatment groups share the same non-capital letter, then the differences between the groups are not statistically significant. E.g., comparing treatments G and H result is insignificant ($p \ge 0,05$) as G and H shares the same letter "e".

Questions:

  1. It's not clear for me: basing on the results, how should I answer the question, which treatment (or group of treatments) is the most effective and which is the least effective?
  2. Is it correct to state that treatments, which share letter "a", are the least effective and the ones, which share "e", are the most effective? Won't it be a misinterpretation of the results as there is no strict boundary between groups of treatments, e.g., treatment G has letter "e" but E has letters "d" and "e", treatment D has "c" and "d" and so on?

For the analysis, I used R and dataset called OrchardSprays.

My question is related to this one but touches different aspects of result interpretation.

Best Answer

Seems to me that you are wanting statistical test-based criteria for describing the relationships between the treatment efficacies. Don't do that! Instead, describe the order of size of effect (use the mean and/or the median effects) and leave it there.

The sample sizes are small relative to the variability and inter-treatment effect size differences and so the descriptor of "most effective" for your data might not be reliable for a larger population.

(It is worth noting that all of the treatments appear to be at least minimally effective, and so a dose-response curve for each treatment might show that they are equally effective but differ in potency. Do not discard drug candidates on the basis of a single dose study.)

Related Question