Suppose you analyze experimantal data with pencil and piece of paper, one simple method is to plot the data in a linearized way with error bars. Suppose you are interested in the slope of this linear function.

Then you may draw the line which visually fits best through all error bars and just calculate the slope.

To estimate the error for example Dana Roberts suggests to draw a line which barely fits within the error bars, calculate the slope and take the difference as error like this:

Wouldn't it be better to determine the lowest and the highest possible slopes and calculate each time the difference to the "best slope"? Why just stick to one case?

If you do so, you would likely end up with an unsymmetric error, which you could report like that, but what to do if you want to keep it simple and your students in the beginner lab report just *one* error? Take the largest of both errors? Or a mean value between them? What would be the most reasonable approach for this and why?

Do you have any further references where this method is discussed in physics or in papers about physics education?

## Best Answer

I like that this question is asking for some intuitive justification and calling for more understanding in error analysis. From a teaching perspective, I think what's most important is that students come up with their own procedure, that's it's reasonable, and that they justify it. This encourages them to do the same sort of critical thinking that you showed in formulating the question.

All the things you suggested are pretty similar; they all involve the idea that error bars tell you a range of plausible values, so roughly speaking a line that goes through the error bars is plausible. There are a range of such lines; all these procedures are estimates of that range. You'll get approximately the same answer no matter what you do, and since the error is only approximate anyway, that's fine. There's no reason to favor one over the other very much.

On the other hand, all these suggestions are a bit off the mark in terms of a nuanced understanding of error. I would give credit for them in an introductory class, but I would hope that at least some students would recognize why they aren't good error estimates for a line's slope, and in an advanced class I would definitely encourage students to go further.

If the size of a typical error bar is $\delta y$ and the gap from the first to last $x$-value is $\Delta x$, then all your proposed estimates give numbers of order $\delta y/\Delta x$. That completely ignores $n$, the number of data points!

Instead, students should recognize that finding a line of best fit is a form of averaging the data points. Like all averages, the error in the slope should decrease as $\dfrac{1}{\sqrt{n}}.$ The more data points they have, the better the average should be. It's like measuring the height of 100,000,000 random people. You can know that the average height of men in that population is 177.21 cm, even if the error bar on any individual height is 0.5 cm.

Additionally, the error in the slope is proportional to $\delta y$ and inversely proportional to $\Delta x.$ If you apply error propagation formulas to a formula for the slope from least-squares linear regression, you'll see $$\delta m \sim \dfrac{\bar{x}\delta y}{\sqrt{N} \sigma_x^2}.$$

This is an heuristic based on the basic ideas of error proagation; you can also look up more detailed formulas for the uncertainty in the slope in least squares linear regression.

Additionally, it's worth pointing out that I would want students with that data to be skeptical of the error bars. In your plot, the best-fit line easily fits through every single error bar. If these are 1-sigma error bars, that's very unlikely with eight data points. It appears the error bars are too big; we should question our procedure for generating them, and justify why we only have y-error and not x-error as well.

Students could use other methods to find the uncertainty in the slope as well. For example, you they try to calculate the likelihood of lines of different slopes and look at the width of the likelihood function as an estimate for the error in slope. One more approach would be to take the best fit line, simulate data based on that best fit line with a random error for each point drawn from some specified distribution that you believe matches your experiment, then run a best-fit algorithm on the simulated data. Repeat that a bunch of times and you'll get a range of best-fit slopes. All these methods should again provide similar results.

The lesson is that students should see data analysis as being about applying canned formulas or procedures. Scientists need to make hard decisions about how to analyze their data, and having students do the same gives them an opportunity for ownership and a more personal involvement. Go for sense-making and reward anyone doing it, even if they're a little wrong.

For references about student thinking about error analysis, check out the papers of Rebecca Lippman Kung and Saalih Allie.