Solved – How to test the linearity between two non normal distributed variables

linearpearson-rregressionspearman-rho

I have two variables $(x_i,y_i), \; i=1, \dots 300$ and I would have liked to apply a linear regression on them, but as you can see in the scatterplot below I have a very bad linear trend.
enter image description here

As I need to motivate it in an essay, I would like to have a measure of the amount of linearity and I used the Pearson product-moment correlation coefficient, obtaining a value of -0.1559585. But after having tested the normality of the variables with a Shapiro-Wilk test, I have obtained that that the X values are not normally distributed, therefore I cannot use the Pearson coefficient to do that. I read that I could computed Spearman's rank correlation coefficient as the X values don't follow a normal distribution, but this coefficient gives an estimation of the monotonic association between X and Y, while I would like to have a quantification of the linearity between X and Y. Do you know how I can compute a quantity that express this, please?

Thank you very much.

Edit: The qqplot of X is the following

enter image description here

Best Answer

Why are you even looking at the distribution of $X$? This has no effect on whether or not the relationship between $X$ and $Y$ is linear. But that aside, Pearson's correlation measures the strength of linear association, period. There are no distributional assumptions needed. Just look at a scatterplot of the points (which by the way you haven't shown, you've provided a Q-Q plot) to see if it's linear and report the correlation.

Also, goodness of fit tests will almost always result in a rejection of the null hypothesis with any reasonable sample size, so they shouldn't be relied on too much.