The middle $5\%$ is between $z=\pm0.06270678$. I.e. $\Phi(-0.06270678)=0.475$ and $1-\Phi(0.6270678)=0.475$ so $2(0.475)=0.95$ is outside those bounds, and $0.05$ is inside.
So:
\begin{align}
\mu+0.06270678\sigma & = 12, \\
\mu-0.06270678\sigma & = 6.
\end{align}
Subtracting the second of these from the first gives $2(0.06270678\sigma)$ on the left and $12-6$ on the right. Having found $\sigma$, you can plug that into either of the two equations above and find $\mu$.
I'm pretty sure Scipy's algorithm does the following:
- Uniformly scales your sample angles into the range $[0, 2\pi]$
- Converts each sample angle into a "point" on the unit circle
- Finds the mean of those points
- Computes the norm of that mean point, call it r
- Computes $\sqrt{\log(1 / r^2)}$
- Scales this value back to the original range of the samples
All the steps seem reasonable except maybe #5. Where does that formula come from? Well first we need to ask the question, where does this formula come from?
$$\hat\sigma = \sqrt{\frac{1}{n-1}\sum_{i=1}^n(x-\hat\mu)^2},\ \ \ \ \ \hat\mu=\frac{1}{n}\sum_{i=1}^nx_i$$
I write $\hat\sigma$ and $\hat\mu$ with hats because they are really estimators of the true population values of standard deviation and mean. I'll let Wikipedia do the variance proof for you, i.e. prove that $\mathbf{E}(\hat\sigma^2)=\sigma^2$.
Their results make use of some operators that we take for granted when our samples are from $\mathbb{R}$, like addition, subtraction, and a commutative multiplication. You have random angles though, which live on $\mathbb{SO2}$ and so we cannot be so care-free.
Rather than trying to guess an estimator of variance for random variables on $\mathbb{SO2}$, we can take an alternative perspective. The normal distribution on $\mathbb{R}$ is special for many reasons, but one is that its variance is exactly one of the two parameters that defines its probability density function (pdf),
$$\rho(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{-(x-\mu)^2}{2\sigma^2}}$$
We can "wrap" this pdf around the unit circle as described here by stacking up the density at $x$ with the density at $x+2\pi k,\ \ \forall k \in \mathbb{N}$. Once we do this, we must limit our sample-space to $[0, 2\pi]$ (or any interval of length $2\pi$) or else we won't be satisfying the definition of a pdf: that its integral over the whole sample-space is 1. Make sure you shift your samples into this range before you use a wrapped pdf!
Doing this results in the wrapped normal distribution; call its probability measure $P_o$ and its density $\rho_o$. Perhaps its variance can give us some insight to what a good "circular variance" estimator would look like. To compute the variance, we can't make use of the expectation over $\mathbb{R}$ like we usually do,
$$\mathbf{E}(X) \neq \int_{0}^{\infty}x\rho_o(x)dx$$
for the reason I stated a moment ago. We'll take our samples as the unit phasors (unit-magnitude complex numbers), a valid representation of $\mathbb{SO2}$ complete over the real domain $[0, 2\pi]$.
$$\mathbf{E}(X) = \int_{\mathbb{SO2}}x\ dP_o = \int_{0}^{2\pi}e^{j\theta}\rho_o(\theta)d\theta$$
This is what is done here where they are able to compute the nth moment and then yank out an expression for $\sigma$ from it, that looks much like the scipy equation. In my description of the scipy algorithm, I put the word "point" in quotes because they are really treating your samples as unit phasors, not vectors. In the next section of the Wikipedia article, they find an (albeit biased) estimator of this $\sigma$ which is exactly what scipy computes.
I would say that this method is "correct" in that it isn't ad-hoc. If you find that your estimator makes more sense for your application, perhaps it is really what you want, but I would consider the scipy function to be the "right" way to measure spread of angles.
Best Answer
$\newcommand{\v}{\operatorname{var}}$ \begin{align} \operatorname{sd}(X-Y) = \sqrt{\v(X-Y)} = \sqrt{\v(X) + \v(Y) -2\operatorname{cov}(X,Y)} \end{align} This reduces to $\sqrt{\v(X) + \v(Y)}$ if $X$ and $Y$ are independent or if the covariance is otherwise $0.$
You have $\v(X-Y) = \v(X) + \v(Y)$ if $X,Y$ are independent.
What you wrote was $\v(X-Y) = \v(X+Y)$ instead of $\v(X-Y) = \v(X)+\v(Y).$ It is also correct if written that way, but not useful in this case, since you need to add the variances. The difference between what you wrote and the one in which you add the variances is the kind of thing you need to pay attention to if you're doing mathematics.
It seems implausible to me that the two variables involved would actually be independent, but that could be assumed in an exercise.