Solved – Does zero correlation always imply that the two variables are not related, even in a smaller sample

correlation

If we find that two variables are not correlated ( correlation coefficient is very weak or exactly 0) in a large population, then is it possible that over a smaller, more concentrated population, there may still be significant correlation between the two?

I am referring to this Skeptic SE answer here which says there is no correlation between porn use and erectile dysfunction (ED). Could it be that the usage of porn leads to ED for some, but helps to combat ED in others, and these two effects cancel out each other and thus result in weak or no correlation on a general scale?

If the answer to the above question is yes, then one of the implications is that the urologists should also prohibit porn use for erectile dysfunction patients.

Best Answer

You're talking about the possibility here of a missing explanatory variable (in your scenario, one that interacts with porn usage in its effect on erectile dysfunction).

It's not necessary to have an interaction for a problem like this to arise; you can get it as long as the ignored variable is not evenly distributed across the variables you do have (see Simpson's paradox, especially the diagram which shows that correlation can even have entirely the wrong sign if you ignore such a variable, even if the direction of the relationship is the same within both groups!).

Clearly if you ignore an important variable, then that can lead to an important relationship looking like no relationship, or vice-versa (or worse make the correlation look like the opposite sign to what the actual conditional relationship is). Or there may be another variable that's causing both the ones you observe to change together, even though they're not connected.

There's also the issue that it's possible to have dependence that is not linear* -- two things may be strongly related but uncorrelated (e.g. what if erectile dysfunction is highest if you watch no porn and if you watch a lot of porn, and lowest in between? That could show up as zero correlation, but would not imply that you are safe with all levels of porn activity).

*(see bottom row of first diagram there Edit: the image also appears in pzsolt's answer)

There are a variety of other ways in which you might have what looks like zero correlation not implying "no relationship" --- or conversely, there are a number of situations where there appears to be a relationship but there's really no causal connection at all.

There's much more that could be said -- for example, I haven't even touched on spurious regression yet (though it's related to some of the things I have mentioned).