Solved – What type of test to use to determine correlation/relationship between two non-continuous varaibles

correlationrregression

I need to determine if there is any relationship between two count variables.
I have 60+ observations for 4 variables and I want to see if any of the pairs of these variables are significantly correlated with one another.

Mostly I use R, so forgive me if you're not familiar.

I have been using the cor(...,method="pearson") and cor.test() functions to test each pair, but now I'm not so sure that this is the right approach/test.
Would a non-linear regression like glm(...,family="poisson") be more appropriate?

I started thinking like this because when I looked at a histogram of the counts across my observations, I noticed that there seemed to be a slight tendency for the pink and green variables to go up and down to together.

I produced a scatter plot of each of the variables plotted against each of the other variables. I used the tests mention above to try and quantify this relationship and to test weather it was real or just noise.

enter image description here
enter image description here

Best Answer

What you are doing is exploratory data analysis. Because you looked at your data first, and then decided to test something specific based on what you just saw, sample statistics (like r-scores) are going to be biased, and inferential statistics (like p-values) would not mean what they are purported to mean. (If that doesn't make sense, you may want to read my answer to a different question here, which should make the underlying ideas more understandable, although the ideas are applied in a different context.) To be clear, I have nothing against exploratory data analysis--I'm quite fond of it, it's just important to realize which game you're playing and what that means for how you think about what you're doing and what you find. With that in mind, I would take the log of all your counts (looking at your numbers, I would probably use log base-2, then each unit would be interpretable as a doubling of the underlying count). Then I would simply make a scatterplot matrix overlaid with loess lines. In R, in the car package, scatterplot.matrix() will do this for you and make it pretty. I would then just use my judgment to assess the situation because of the bias issues noted above. If you want a formal model and hypothesis tests, I would figure out what I thought might be true (and if it were interesting & important, etc.) and then gather a new dataset specifically to test that hypothesis.

Related Question