There are two issues at play here: The mathematics of statistics, and the conventions of communication of statistics. You're right that it's unconventional to report $R^2$ for a correlation, at least in most fields. But there's nothing wrong with it mathematically.
You can see this more clearly if you consider the case of simple univariate linear regression (a regression model with one continuous dependent variable and one continuous predictor). TO demonstrate, I'll use the iris
dataset, which comes built into R. Here are the first six lines:
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
I can calculate the correlation between Sepal.Length and Sepal.Width
> cor(iris$Sepal.Length, iris$Sepal.Width)
[1] -0.1175698
I'll square that correlation and save it as Rsq
for comparison with the regression output.
> r <- cor(iris$Sepal.Length, iris$Sepal.Width)
> Rsq <- r^2
A simple linear regression predicting Sepal.Length from Sepal.Width:
> summary(lm(Sepal.Length ~ Sepal.Width, data = iris))
Call:
lm(formula = Sepal.Length ~ Sepal.Width, data = iris)
Residuals:
Min 1Q Median 3Q Max
-1.5561 -0.6333 -0.1120 0.5579 2.2226
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.5262 0.4789 13.63 <2e-16 ***
Sepal.Width -0.2234 0.1551 -1.44 0.152
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8251 on 148 degrees of freedom
Multiple R-squared: 0.01382, Adjusted R-squared: 0.007159
F-statistic: 2.074 on 1 and 148 DF, p-value: 0.1519
> Rsq
[1] 0.01382265
Note that the Multiple R-squared statistic reported is exactly the same as the squared correlation between the two predictors. Of course, this works just as well if you reverse which variable is the predictor and which is the outcome in the regression model:
> summary(lm(Sepal.Width ~ Sepal.Length, data = iris))
Call:
lm(formula = Sepal.Width ~ Sepal.Length, data = iris)
Residuals:
Min 1Q Median 3Q Max
-1.1095 -0.2454 -0.0167 0.2763 1.3338
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.41895 0.25356 13.48 <2e-16 ***
Sepal.Length -0.06188 0.04297 -1.44 0.152
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4343 on 148 degrees of freedom
Multiple R-squared: 0.01382, Adjusted R-squared: 0.007159
F-statistic: 2.074 on 1 and 148 DF, p-value: 0.1519
When you have more than one predictor in a regression model, then $R^2$ is the squared multiple correlation instead of just the squared bivariate correlation. But the idea behind it is very much the same.
The conventions around reporting statistics often obscure how similar many of our tests and measures are; $r$ and $R^2$ are a great example of that.
The coefficient of determination is defined for the model as whole and not for individual variables. However, there is a technique called ANOVA which can roughly be thought of as breaking $R^2$ into contributions from each variable.
Recall that the coefficient of determination is defined in terms of the sums of squares of residuals:
$$
\begin{align}
R^2 & = 1 - {SS_{\rm res}\over SS_{\rm tot}} \\
SS_\text{tot} & =\sum_{i=1}^n (\bar{y} - y_i)^2 \\
SS_\text{res} & =\sum_{i=1}^n (\hat{y}_i-y_i)^2 \\
\end{align}$$
Where $\hat{y}$ is the prediction vector of the model. Since we can't make a prediction $\hat{y}$ without considering all of the variables in the model.
But look at the equation for $SS_\text{tot}$ more again. This has the exact same form as the $SS_res$ if it were a trivial model with only an intercept term; such a model would predict $\hat{y}_i = \bar{y}$ for all $i$. This suggests that we are not comparing one model to some platonic ideal, but actually comparing two different models. This insight can be generalized into a chain of models:
$$ \frac{SS_1}{SS_{\text{tot}}} + \frac{SS_2}{SS_{\text{tot}}} + ... + \frac{SS_k}{SS_{\text{tot}}}= 1 $$
If we consider a chain of models, starting from the intercept only model and adding one variable at a time, then the quantity $\frac{SS_j - SS_{j-1}}{SS_\text{tot}}$ can be intrepretted as the "amount of variance explained by the $j$-th variable. As a concrete example, here is the output of the anova()
function on the built-in airquality dataset:
Analysis of Variance Table
Response: Ozone
Df Sum Sq Mean Sq F value Pr(>F)
Solar.R 1 14780 14780 33.9704 6.216e-08 ***
Wind 1 39969 39969 91.8680 5.243e-16 ***
Temp 1 19050 19050 43.7854 1.584e-09 ***
Month 1 1701 1701 3.9101 0.05062 .
Day 1 619 619 1.4220 0.23576
Residuals 105 45683 435
This is called the "sequential" analysis of variance. The Sum Sq
column sums to the total sums of squares of the entire dataset, so we can see that Wind
explains twice as much variance of Temp
. This interpretation is subject to many caveats: it is sensitive to the order in which variables are added, and the F-scores and associated P values on the left are only meaningful for a purely linear model, etc. Nevertheless, if we take that Sum Sq
column and divide by total sums of squares:
Solar.R 0.12
Wind 0.33
Temp 0.16
Month 0.01
Day 0.01
Residuals 0.38
We get a table where ever line item is roughly analogous to the quote-unquote "$R^2$" for each variable (plus one line item for the unexplained residual), although that terminology is never used, as far as I know. People talk about the proportion of variance explained instead.
Here are some additional resources if you want to read further:
- https://math.stackexchange.com/questions/1792351/sequential-anova-r
- https://astrostatistics.psu.edu/su07/R/html/stats/html/anova.lm.html
- http://www-ist.massey.ac.nz/dstirlin/CAST/CAST/HseqRegnSsq/seqRegnSsq4.html
Best Answer
I notice that for these sort of questions there is always a lot of pedantry in the community about the use of the term "correlation". Us non-statisticians use the term to generally mean "relationship", but some people might not get that. So like others have told you, you can't compute the correlation coefficient for a non-linear relationship such as a quadratic relationship. However, you can measure the Root Mean Squared Error and Adjusted R-squared, which will tell you about the "goodness of fit" of your model. You can also do an F-test, which will tell you how much better your model is compared to a degenerate model consisting of only a constant term. All of these measures can be computed in Matlab using the function fitnlm. I know it's been a while since this question was posted so you probably figured this out, but this could still help others. Best of luck.