Asking a separate question because whilst this has been answered for polynomial regression the solution doesn't work for me.
I'm performing a simple linear regression. Both R and Excel give the same P values, $R^2$ values and F statistics, but they give different coefficients (and therefore equations to put on the graph?)
This is the data:
suva Std dev heather sedge sphagnum
12.880545 0.061156645 0.15 0.525 0
7.098873327 0.026878039 0.2275 0 0
8.660688381 0.04037841 0.425 0.25 0
7.734546932 0.021618446 0.225 0.3875 0
16.70696048 0.103626684 0.15 0.075 0
9.763315183 0.013387158 0.25 0.075 0
12.91735434 0.008076468 0.22 0.22 0
19.94153851 0.150798057 0.0375 0.35 0.225
17.25115559 0.052229596 0.0625 0.2625 0.225
15.38596941 0.05429447 0.1125 0.45 0.025
15.53714185 0.05933884 0.1625 0.525 0.0625
14.11551229 0.064579437 0.1875 0.35 0.1375
14.88575569 0.0189853 0.3375 0.3 0
12.32229733 0.043085602 0.0875 0.1375 0
17.23861185 0.071705699 0.15 0.1375 0
11.50832463 0.1125 0.0875 0.075
14.4810484 0.078476821 0.0375 0.125 0.0625
9.110262652 0.077306938 0.145 0.35 0.0125
10.8571733 0.02681341 0.0375 0.525 0
9.589339421 0.01892435 0.2275 0 0
7.260373588 0.014538237 0.425 0.25 0
11.11099161 0.022802578 0.225 0.3875 0
10.81488848 0.047587818 0.15 0.075 0
8.224131957 0.031126904 0.25 0.075 0
8.818607863 0.002855409 0.22 0.22 0
11.53999863 0.031465613 0.0375 0.35 0.225
14.92784964 0.069998663 0.0625 0.2625 0.225
9.666480932 0.02387741 0.1125 0.45 0.025
12.51000758 0.016960259 0.1625 0.525 0.0625
13.32611463 0.033670382 0.1875 0.35 0.1375
16.76535191 0.029613698 0.3375 0.3 0
11.24615281 0.008440059 0.0875 0.1375 0
10.60564875 0.003930792 0.15 0.1375 0
11.82909125 0.036017582 0.1125 0.0875 0.075
18.2337185 0.143451512 0.0375 0.125 0.0625
10.6226222 0.020561242 0.145 0.35 0.0125
Excel gives the following output:
Regression Statistics
Multiple R = 0.420431825
R Square = 0.176762919
Adjusted R Square = 0.152550064
Standard Error = 3.054032651
Observations = 36ANOVA
df SS MS F Significance F
Regression 1 / 68.0914386 / 68.0914386 / 7.300374813 / 0.010676279
Residual 34 / 317.1219248 / 9.327115435
Total 35 / 385.2133634
Coefficients
Coefficients Standard Er t Stat P-value
Intercept 14.69082425 / 0.996953237 / 14.7357205 / 2.50758E-16
Heather% -13.78771857 / 5.102932589 / -2.701920579 / 0.010676279
Coefficients con't
Lower95% Upper 95% Lower 95.0% Upper 95.0%
12.66477151 / 16.71687699 / 12.66477151 / 16.71687699
-24.15812531 / -3.41731183 / -24.15812531 / -3.417311838
Whilst R gives the following output:
HS <- lm(heather ~ suva, data=data1)
summary(HS)Call: lm(formula = heather ~ suva, data = data1)
Residuals:
Min 1Q Median 3Q Max
-0.149941 / -0.063070 / -0.009812 / 0.038370 / 0.225804
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.326633 0.060733 5.378 5.54e-06
suva -0.012820 0.004745 -2.702 0.0107 *Signif. codes: 0 ‘’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.09313 on 34 degrees of freedom Multiple
R-squared: 0.1768, Adjusted R-squared: 0.1526 F-statistic: 7.3 on
1 and 34 DF, p-value: 0.01068
Why are they different in terms of their coefficients? Which one is correct?
Best Answer
The difference between coefficients is in the relation x versus y which is reversed in the one case.
Note that
see in the following code where R can get to both cases:
rest of the code:
Why then, is $R^2$ still the same?
There is a certain symmetry in the situation. The regression slope coefficient is (in simple linear regression) the correlation coefficient scaled by the variance of the $x$ and $y$ data.
$$\hat\beta_{y \sim x} = r_{xy} \frac{s_y}{s_x}$$
The regression model variance is then:
$$s_{mod} = \hat\beta_{y \sim x} s_x = r_{xy} s_y$$
and the ratio of model variance and variance of the data is:
$$R^2 = \left( \frac{s_{mod}}{s_y} \right)^2= r_{xy}^2$$