Sorry for the confusing title, I think this is a general statistics question, but I'm working in R. I have a combined dataset of two samples from different countries (n=240 and n=1,010), and when I run a linear regression between the same three variables in each dataset, both datasets produce a significant result, with almost identical coefficients. However, when I merge the datasets and run the same regression on the combined dataset, it is no longer significant. Can anyone explain this?
In case it matters, the regression has the form lm(a~b*c)
.
Best Answer
Without seeing your data, this is difficult to answer definitively. One possibility is that your datasets span different ranges of the independent variable. It is well-known that combining data across different groups can sometimes reverse correlations seen in each group individually. This effect is known as Simpson's Paradox.