Solved – What does r, r squared and residual standard deviation tell us about a linear relationship

linearpearson-rrregressionregression coefficients

Little background
I'm working on the interpretation of regression analysis but I get really confused about the meaning of r, r squared and residual standard deviation.
I know the definitions:

Characterizations

r measures the strength and direction of a linear relationship between
two variables on a scatterplot

R-squared is a statistical measure of how close the data are to the
fitted regression line.

The residual standard deviation is a statistical term used to describe
the standard deviation of points formed around a linear function, and
is an estimate of the accuracy of the dependent variable being
measured. (Don't know what the units are, any information about the units here would be helpful)

(sources: here)

Question
Although I "understand" the characterizations, I do understand how these terms cothert to draw a conclusion about the dataset.
I will insert a little example here, maybe this can serve as a guide to answer my question (feel free to use an example of your own!)

Example
This is not a howework question, however I searched in my book to get a simple example (the current dataset I'm analyzing is too complex and large to show here)

Twenty plots, each 10 x 4 meters, were randomly chosen in a large field of corn. For each plot, the plant density (number of plants in the plot) and the mean cob weight (gm of grain per cob) were observed. The results are givin in the following table:
(source: Statistics for the life sciences)

╔═══════════════╦════════════╦══╗
║ Platn density ║ Cob weight ║  ║
╠═══════════════╬════════════╬══╣
║           137 ║        212 ║  ║
║           107 ║        241 ║  ║
║           132 ║        215 ║  ║
║           135 ║        225 ║  ║
║           115 ║        250 ║  ║
║           103 ║        241 ║  ║
║           102 ║        237 ║  ║
║            65 ║        282 ║  ║
║           149 ║        206 ║  ║
║            85 ║        246 ║  ║
║           173 ║        194 ║  ║
║           124 ║        241 ║  ║
║           157 ║        196 ║  ║
║           184 ║        193 ║  ║
║           112 ║        224 ║  ║
║            80 ║        257 ║  ║
║           165 ║        200 ║  ║
║           160 ║        190 ║  ║
║           157 ║        208 ║  ║
║           119 ║        224 ║  ║
╚═══════════════╩════════════╩══╝

First I will make a scatterplot to visualize the data:

So I can calculate r, R² and the residual standard deviation.
first the correlation test:

    Pearson's product-moment correlation

data:  X and Y
t = -11.885, df = 18, p-value = 5.889e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.9770972 -0.8560421
sample estimates:
       cor 
-0.9417954

and secondly a summary of the regression line:

Residuals:
    Min      1Q  Median      3Q     Max 
-11.666  -6.346  -1.439   5.049  16.496 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 316.37619    7.99950   39.55  < 2e-16 ***
X            -0.72063    0.06063  -11.88 5.89e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.619 on 18 degrees of freedom
Multiple R-squared:  0.887, Adjusted R-squared:  0.8807 
F-statistic: 141.3 on 1 and 18 DF,  p-value: 5.889e-10

So based on this test: r = -0.9417954 , R-squared: 0.887 and Residual standard error: 8.619
What do these values tell us about the dataset? (see Question)

Best Answer

Those statistics can tell you about whether there is a linear component to the relationship but not much about whether the relationship is strictly linear. A relationship with a small quadratic component can have an r^2 of 0.99. A plot of residuals as a function of predicted can be revealing. In Galileo's experiment here https://ww2.amstat.org/publications/jse/v3n1/datasets.dickey.html the correlation is very high but the relationship is clearly nonlinear.

Related Solutions

Optimization – Residual Standard Error Difference Between Optim and GLM in R

The issues is that the standard errors comes from

$$\hat\sigma^2 (X^\top X)^{-1}$$

where $\hat\sigma^2$ is the unbiased estimator and not the MLE. See summary.lm

summary.lm
#R function (object, correlation = FALSE, symbolic.cor = FALSE, 
#R     ...) 
#R {
#R    z <- object
#R    p <- z$rank
#R    rdf <- z$df.residual
#R    ...
#R    Qr <- qr.lm(object) 
#R    ... 
#R    r <- z$residuals
#R    f <- z$fitted.values
#R    w <- z$weights
#R    if (is.null(w)) {
#R         mss <- if (attr(z$terms, "intercept")) 
#R             sum((f - mean(f))^2)
#R         else sum(f^2)
#R         rss <- sum(r^2)
#R    }
#R    ...
#R    resvar <- rss/rdf
#R    ...
#R    R <- chol2inv(Qr$qr[p1, p1, drop = FALSE])
#R    se <- sqrt(diag(R) * resvar)
#R    ...

This is the inverse observed Fisher information for $(\beta_0, \beta_1)$ conditional on $\hat\sigma^2$. Now the inverse observed Fisher information you compute is for the triplet $(\beta_0, \beta_1, \sigma)$. I.e., you use the MLE of $\sigma$ and not the unbiased estimator. Thus, I gather the standard errors should differ by factor $\sqrt{n/(n-3 + 1)}$ or something similar. This is the case

set.seed(1)
n = 4 # very small sample size !
b0 <- 5
b1 <- 2
sigma <- 5
x <- runif(n, 1, 100)
y =  b0 + b1*x + rnorm(n, 0, sigma) 

negLL <- function(beta, y, x) {
  b0 <- beta[1]
  b1 <- beta[2]
  sigma <- beta[3]
  yhat <- b0 + b1*x
  return(-sum(dnorm(y, yhat, sigma, log = TRUE)))
}

res <- optim(c(0, 0, 1), negLL, y = y, x = x, hessian=TRUE)
estimates <- res$par     # Parameters estimates
(se <- sqrt(diag(solve(res$hessian))))
#R [1] 5.690 0.097 1.653
k <- 3
se * sqrt(n / (n-k+1))
#R [1] 8.047 0.137 2.338

To elaborate more as usεr11852 requests, the log-likelihood is

$$l(\vec{\beta},\sigma) = -\frac{n}{2}\log(2\pi) - n\log{\sigma} - \frac{1}{2\sigma^2}(\vec{y}-X\vec\beta)^\top(\vec{y}-X\vec\beta)$$

where $X$ is the design matrix and $n$ is the number of observation. Consequently, the observed information matrix is

$$-\nabla_{\vec{\beta}}\nabla_{\vec{\beta}}^\top l(\vec{\beta},\sigma) = \frac{1}{\sigma^2}X^\top X$$

Now we can either plug in the MLE or the unbaised estimator of $\sigma$ as the following show

m <- lm(y ~ x)
X <- cbind(1, x)
sqrt(sum(resid(m)^2)/n       * diag(solve(crossprod(X))))
#R                     x 
#R 5.71058285 0.09732149
k <- 3
sqrt(sum(resid(m)^2)/(n-k+1) * diag(solve(crossprod(X))))
#R                   x 
#R 8.0759837 0.1376334

We can do the same with a QR decomposition as lm does

obj <- qr(X)
sqrt(sum(resid(m)^2)/(n-k+1) * diag(chol2inv(obj$qr)))
#R [1] 8.0759837 0.1376334

So to answer

I understand from my readings on the web that optimization is not a simple task but I was wondering if it would be possible to reproduce in a simple way the standard error estimates from glm while using optim.

then you need to scale up the standard errors in the Gaussian example you use.

Solved – the meaning of the residual standard error in linear ordinary least squares output

The error is adjusted for the number of parameters in the model. This is just like standard deviation. For standard deviation one is adjusting for the fact that the mean value is an estimate by using $n-1$ in the denominator as opposed to $n$. If one has other parameters as well then one has to subtract $p$ from $n$ to adjust for that. Suppose that $p=n$, then there is no "play" in the solution, and no regression. What you would have in that case is an exact solution, not a regression per se. Then if you have $n=p+1$, there is only one degree of freedom, so all of the variability is only applied to one surfeit measurement, which has so little variability that it accounts the entire root RSS. As $n$ becomes progressively larger compared to $p$, then in general (for non-linear and linear modelling) we are adjusting by $n-p$.

BTW, residual sum squared is defined as $RSS = \sum_{i=1}^n (y_i - f(x_i))^2$ for any $f(x_i)$, whether linear or not.

Best Answer

Related Solutions

Optimization – Residual Standard Error Difference Between Optim and GLM in R

Solved – the meaning of the residual standard error in linear ordinary least squares output

Related Question