Solved – Acceptance of null hypothesis

hypothesis testing

This is a discussion question on the intersection of statistics and other sciences. I often face the same problem: researchers in my field tend to say that there is no effect when the p-value is not less than the significance level. In the beginning, I often replied this is not how hypothesis testing works. Given how often this question arises, I would like to discuss this issue with more experienced statisticians.

Let us consider a recent paper in scientific journal from “the best publishing group” Nature Communications Biology (there are multiple examples, but let's focus on one)

Researchers interpret a not statistically significant result in the following way:

Thus chronic moderate caloric restriction can extend lifespan and
enhance health of a primate, but it affects brain grey matter
integrity without affecting cognitive performances.

Proof:

However, performances in the Barnes maze task were not different
between control and calorie-restricted animals (LME: F = 0.05,
p = 0.82; Fig. 2a). Similarly, the spontaneous alternation task did
not reveal any difference between control and calorie-restricted
animals (LME: F = 1.63, p = 0.22; Fig. 2b).

The authors also suggest the explanation of the absence of the effect – but the key point is not the explanation but the claim itself. The provided plots look significantly different "by eye" for me (Figure 2).

Moreover, authors ignore the prior knowledge:

deleterious effects of caloric restriction on cognitive performance
have been reported for rats and for cerebral and emotional functions
in humans

I can understand the same claim for the huge sample sizes (no effect = no practically significant effect there), but in particular situation complex tests were used and it is not obvious for me how to perform power calculations.

Questions:

  1. Did I overlook any details that make their conclusions valid?

  2. Taking into account the need to report negative results in science, how to prove that it is not "the absence of result" (that we have with $p > \alpha$), but "negative result (eg there is no difference between groups)" using statistics? I understand that for huge sample sizes even small deviations from null cause rejection, but let's assume that we have ideal data and still need to prove that null is practically true.

  3. Should statisticians always insist on mathematically correct conclusions like "having this power we were not able to detect effect of significant size"? Researchers from other fields strongly dislike such formulations of negative results.

I would be glad to hear any thoughts on the problem and I've read and understood related questions on this web site. There is a clear answer to questions 2)-3) from the point of view of statistics, but I would like to understand how this questions have to be answered in case of interdisciplinary dialogue.

UPD: I think a good example of negative result is the 1st stage of medical trials, safety. When scientists can decide that the drug is safe? I guess they compare two groups and do statistics on this data. Is there a way to say that this drug is safe? Cochrane uses accurate "no side effect were found", but doctors say that this drug is safe. When the balance between accuracy and simplicity of description mets and we can say "there is no consequence for health"?

Best Answer

I think it is at times appropriate to interpret non-statistically significant results in the spirit of "accept the null hypothesis". In fact, I have seen statistically significant studies interpreted in such a fashion; the study was too precise and results were consistent with a narrow range of non-null but clinically insignificant effects. Here's a somewhat blistering critique of a study (or moreover its press) about the relation between chocolate/red wine consumption and its "salubrious" effect on diabetes. The probability curves for insulin resistance distributions by high/low intake is hysterical.

Whether one can interpret findings as "confirming H_0" depends on a great number of factors: the validity of the study, the power, the uncertainty of the estimate, and the prior evidence. Reporting the confidence interval (CI) instead of the p-value is perhaps the most useful contribution you can make as a statistician. I remind researchers and fellow statisticians that statistics do not make decisions, people do; omitting p-values actually encourages a more thoughtful discussion of the findings.

The width of the CI describes a range of effects which may or may not include the null, and may or may not include very clinically significant values like life-saving potential. However, a narrow CI confirms one type of effect; either the latter type which is "significant" in a true sense, or the former which may be the null or something very close to the null.

Perhaps what is needed is a broader sense of what "null results" (and null effects) are. What I find disappointing in research collaboration is when investigators cannot a priori state what range of effects they are targeting: if an intervention is meant to lower blood pressure, how many mmHg? If a drug is meant to cure cancer, how many months of survival will the patient have? Someone who is passionate with research and "plugged-in" to their field and science can rattle off the most amazing facts about prior research and what has been done.

In your example, I can't help but notice that the p-value of 0.82 is likely very close to the null. From that, all I can tell is that the CI is centered on a null value. What I do not know is whether it encompasses clinically significant effects. If the CI is very narrow, the interpretation they give is, in my opinion, correct but the data do not support it: that would be a minor edit. In contrast, the second p-value of 0.22 is relatively closer to its significance threshold (whatever it may be). The authors correspondingly interpret it as "not giving any evidence of difference" which is consistent with a "do not reject H_0"-type interpretation. As far as the relevance of the article, I can say very little. I hope that you browse the literature finding more salient discussions of study findings! As far as analyses, just report the CI and be done with it!