Solved – Error in a linear regression

errorrregressionstandard error

I have a set of points and I would like to fit a linear regression model to them, where each point has its own error value, and I want to find the gradient of the regression line. How do I calculate the standard error on the gradient? In R particularly would be helpful.

Edit:
I feel like I am misusing these terms horribly, so here is what my data looks like:

enter image description here

with a lm fitted line in R. The gradient of the line corresponds to a physical quantity, I need to find the value and standard error in this quantity.

Edit 2: I feel that I should point out that in this case, all the data values have the same precision. I am interested in both this and the general case though.

Best Answer

One approach that comes to my mind looking at your data is to conduct a simulation study. You have five mean values $\bar y_1,...,\bar y_5$ and five corresponding standard deviations $s_1,...,s_5$. It seems from the plot you provided that standard errors for the groups do not differ much from each other (so sample sizes and standard deviations are probably close for the groups), but I assumed more general case when they differ.

Knowing all this you can simulate different regression slopes by sampling $r=1,...,R$ times new values for $y_1^{(r)},..., y_5^{(r)}$ groups from normal distributions (but other choice is also possible, e.g. $t$-distribution)

$$ y_i^{(r)} \sim \mathrm{Normal}(\bar y_i, s_i) $$

and then estimating regression using those values

$$ y_i^{(r)} = \beta_0^{(r)} + \beta_1^{(r)} x_i + \varepsilon_i^{(r)} $$

using $\beta_0^{(r)}$ and $\beta_1^{(r)}$ values from each of the simulation repetitions you can compute the average slope and intercept and compute confidence intervals for those values and around the regression line applying the same methods like you would do with bootstrap results (e.g. using quantiles).

Below you can see R code for such simulation.

# generating example data

set.seed(123)

N <- 5
n <- sample(20, N, replace = TRUE)
s <- runif(N, 0, 3)
m <- runif(N, 0, 5)
X <- rnorm(sum(n), rep(m, n), rep(s, n))
x <- tapply(X, rep(1:N, n), mean)
y <- -6.2 * x + 2 + rnorm(N)

# simulation

f <- function() {
  ysamp <- rnorm(y, y, s)
  fit <- lm(ysamp ~ x)
  out <- c(coef(fit),
           fitted(fit),
           ysamp)
  names(out)[-c(1:2)] <- paste0(rep(c("yhat", "ysim"), each = N), 1:5)
  out
}

sim <- replicate(1e3, f())
coef <- rowMeans(sim[1:2, ])
quant.ci <- apply(sim[3:(2+N), ], 1, quantile, c(.025, .975))

enter image description here