It's common to see people report variability in their realized sample through the $\bar{x} \pm 2s$ range, where $\bar{x}$ is the realized mean and $s$ the realized standard deviation estimate in their sample.
Beyond informing about the realized sample, I am guessing $\bar{x} \pm 2s$ is often reported in the hope that it provides information about $\mu \pm 2\sigma$.
But unless the sample is very large, $\bar{x}$ and $s$ will be imperfect estimates of $\mu$ and $\sigma$, and $\bar{x} \pm 2s$ will therefore be an inaccurate estimate of $\mu \pm 2\sigma$.
Whether or not it's a good idea to try and estimate $\mu \pm 2\sigma$, this leads me to wonder about standard practices to do so, and to assess the accuracy of the estimated range (including the relationship between accuracy and sample size $n$).
I guess you could construct standard CIs around both $\mu$ and $\sigma$ and somehow "combine" these two CI looking at worse case scenarios to get a "maximum" CI for $\mu \pm 2\sigma$ you're equally confident in (say by taking the highest value in the CI for $\sigma$ plus/minus the highest/lowest value in the CI for $\mu$).
But that seems super hacky and not well-founded theoretically (among other things, I suspect there are issues with $s$ appearing in the CI formula for both $\mu$ and $\sigma$?).
If that's indeed not a good idea, what would be a better approach to estimating $\mu \pm 2\sigma$ with confidence? In the classical CI sense, and under some reasonable distributional assumptions, is there an observable random interval $[\underline{F}(X_1, \dots, X_n; \alpha), \bar{F}(X_1, \dots, X_n; \alpha)]$ that is guaranteed to include the whole $\mu \pm 2\sigma$ range $(1-\alpha)$ percent of the time (in a similar way that $(1-\alpha)$ percent of the time, the range $\bar{X} \pm z_{\alpha/2} (s/\sqrt{n})$ includes $\mu$ itself)?
That is, I am looking for a family of statistics $\underline{F}(X_1, \dots, X_n; \alpha)$ and $\bar{F}(X_1, \dots, X_n; \alpha)$ such that:
$$P\big(\underline{F}(X_1, \dots, X_n; \alpha) \leq \mu – 2\sigma < \mu + 2\sigma \leq \bar{F}(X_1, \dots, X_n; \alpha) \big) = 1-\alpha.$$
Best Answer
As @whuber wrote in the comments, what I am describing is a "Tolerance Interval" (see e.g., Prediction and Tolerance Intervals).
If I understand things correctly so far, one connection with the $\bar{x} + 2s$ rule of thumb is that, with normal data, as the sample-size $n$ tends to infinity, $\bar{x} + 1.96s$ obviously gets closer and closer to including 95% of the population (as $n$ tends to infinity, this is true, I believe, for any level of confidence, be it 95%, 99%, 99.9%,...).
However, for finite $n$, intervals including 95% of the population with, say, 95% confidence are larger than $\bar{x} + 1.96s$ (e.g., the interval is around $\bar{x} + 2.75s$ when $n=20$, where calculations are from https://statpages.info/tolintvl.html with underlying statistical assumption --- in particular, normal data --- described in https://www.itl.nist.gov/div898/handbook/prc/section2/prc263.htm).