Solved – Calculate uncertainty of linear regression slope based on data uncertainty

regressionuncertainty

How to calculate uncertainty of linear regression slope based on data uncertainty (possibly in Excel/Mathematica)?

Example:
Example plot
Let's have data points (0,0), (1,2), (2,4), (3,6), (4,8), … (8, 16), but each y value has an uncertainty of 4. Most functions I found would calculate the uncertainty as 0, as the points perfectly match the function y=2x. But, as shown on the picture, y=x/2 match the points as well. It's an exaggerated example, but I hope it shows what I need.

EDIT: If I try to explain a bit more, while every point in example has a certain value of y, we pretend we don't know if it's true. For example the first point (0,0) could actually be (0,6) or (0,-6) or anything in between. I'm asking if there is an algorithm in any of the popular problems that takes this in account. In the example the points (0,6), (1,6.5), (2,7), (3,7.5), (4,8), … (8, 10) still fall in the uncertainty range, so they might be the right points and the line that connects those points has an equation: y = x/2 + 6, while the equation we get from not factoring in the uncertainties has equation: y=2x + 0. So uncertainty of k is 1,5 and of n is 6.

TL;DR: In the picture, there is a line y=2x that's calculated using least square fit and it fits the data perfectly. I'm trying to find how much k and n in y=kx + n can change but still fit the data if we know uncertainty in y values. In my example, uncertainty of k is 1.5 and in n it's 6. In the image there is the 'best' fit line and a line that just barely fits the points.

Best Answer

Responding to "I'm trying to find how much $k$ and $n$ in $y = k x + n$ can change but still fit the data if we know uncertainty in $y$ values."

If the true relation is linear and the errors in $y$ are independent normal random variables with zero means and known standard deviations then the $100(1-\alpha)$% confidence region for $(k,n)$ is the ellipse for which $\sum (k x_i + n - y_i)^2/\sigma_i^2 < \chi_{d,\alpha}^2$, where $\sigma_i$ is the standard deviation of the error in $y_i$, $d$ is the number of $(x,y)$ pairs, and $\chi_{d,\alpha}^2$ is the upper $\alpha$ fractile of the chi-square distribution with $d$ degrees of freedom.

EDIT - Taking the standard error of each $y_i$ to be 3 -- i.e., taking the error bars in the plot to represent approximate 95% confidence intervals for each $y_i$ separately -- the equation for the boundary of the 95% confidence region for $(k,n)$ is $204 (k-2)^2 + 72n(k-2) + 9n^2 = 152.271$.

enter image description here