I'm collecting data from x and y axis offset from origin of the impact points of rounds I've shot at a target, and I've calculated my standard deviations in the x and y directions as $ \sigma_x $ and $ \sigma_y $ respectively from their variances, and I have also calculated the standard deviation of the root of the sums of their squares (the magnitude of the distance from the origin/bull's eye) as $\sigma_r$. Now I know that these values are deviations around the mean value of each variable, which characterizes my shot grouping. I'm seeking to find my standard deviation away from the origin as well, i.e. I'm wanting to know how far I'm deviating away from the center of the target versus how far I'm deviating from the calculated center of my grouping. Would this value just be the calculation of the $\sigma^2$ and $\sigma$ around a mean of 0 in all the variables so that I have a measure of deviation from perfect, or is this the incorrect way to go about the problem. All standard deviations used are population standard deviations, e.g. $$ \sigma^2 = \sum_{i=1}^{n} \left ( \frac{1}{n}(x_{i} – \mu)^{2} \right ) \;\; ; \; \; \sigma = \sqrt{\sigma^{2}} $$ where in the case in question, $\mu$ would be taken to be 0, leaving just the sums of the squares as the variance.
[Math] Standard Deviation Around an Arbitrary Mean
statistics
Related Solutions
Your guess is correct: least absolute deviations was the method tried first historically. The first to use it were astronomers who were attempting to combine observations subject to error. Boscovitch in 1755 published this method and a geometric solution. It was used later by Laplace in a 1789 work on geodesy. Laplace formulated the problem more mathematically and described an analytical solution.
Legendre appears to be the first to use least squares, doing so as early as 1798 for work in celestial mechanics. However, he supplied no probabilistic justification. A decade later, Gauss (in an 1809 treatise on celestial motion and conic sections) asserted axiomatically that the arithmetic mean was the best way to combine observations, invoked the maximum likelihood principle, and then showed that a probability distribution for which the likelihood is maximized at the mean must be proportional to $\exp(-x^2 / (2 \sigma^2))$ (now called a "Gaussian") where $\sigma$ quantifies the precision of the observations.
The likelihood (when the observations are statistically independent) is the product of these Gaussian terms which, due to the presence of the exponential, is most easily maximized by minimizing the negative of its logarithm. Up to an additive constant, the negative log of the product is the sum of the squares (all divided by a constant $2 \sigma^2$, which will not affect the minimization). Thus, even historically, the method of least squares is intimately tied up with likelihood calculations and averaging. There are plenty of other modern justifications for least squares, of course, but this derivation by Gauss--with the almost magical appearance of the Gaussian, which had first appeared some 70 years early in De Moivre's work on sums of Bernoulli variables (the Central Limit Theorem)--is memorable.
This story was researched, and is ably recounted, by Steven Stigler in his The History of Statistics - The Measurement of Uncertainty before 1900 (1986). Here I have merely given the highlights of parts of chapters 1 and 4.
The mean of $x$ = mean of $y$
This is not true.
The way you should approach this problem is to use the formulas for mean and standard deviation directly: \begin{align*} \text{Mean}(y_1, y_2, \ldots, y_n) &= \frac{y_1 + y_2 + \cdots + y_n}{n} \\ &= \frac{(ax_1 + b) + (ax_2 + b) + \cdots + (ax_n + b)}{n} \\ &= \frac{a(x_1 + x_2 + \cdots + x_n) + nb}{n} \\ &= a \cdot \text{Mean}(x_1, x_2, \ldots, x_n) + b \\ \end{align*}
See if you can do a similar algebraic manipulation for standard deviation.
Best Answer
The line of reasoning in the question is correct.
Calculation of moments about the origin differ only from the former by setting $\mu = 0$.