My problem is as follows:
I drop 40 balls at once from a certain point, a few meters over the floor.
The balls roll, and comes to a rest.
Using computer vision, I calculate the center of mass in the X-Y plane.
I am only interested in the distance from the center of mass to each ball, which is calculated using simple geometry.
Now, I want to know the one-sided standard deviation from the center.
So, I would be able to know that a certain number of balls are within one std radius, more balls within 2*std radius and so on.
How do I calculate the one-sided standard deviation? A normal approach would state that half of the balls be on the "negative side" of 0 mean. This of course makes no sense in this experiment.
Do I have to make sure that the balls conform to the standard distribution?
Thank you for any help.
Solved – How to calculate 2D standard deviation, with 0 mean, bounded by limits
normal distributionstandard deviation
Best Answer
To characterize the amount of 2D dispersion around the centroid, you just want the (root) mean squared distance,
$$\hat\sigma=\text{RMS} = \sqrt{\frac{1}{n}\sum_i\left((x_i - \bar{x})^2 + (y_i - \bar{y})^2\right)}.$$
In this formula, $(x_i, y_i), i=1, 2, \ldots, n$ are the point coordinates and their centroid (point of averages) is $(\bar{x}, \bar{y}).$
The question asks for the distribution of the distances. When the balls have an isotropic bivariate Normal distribution around their centroid--which is a standard and physically reasonable assumption--the squared distance is proportional to a chi-squared distribution with two degrees of freedom (one for each coordinate). This is a direct consequence of one definition of the chi-squared distribution as a sum of squares of independent standard normal variables, because $$x_i - \bar{x} = \frac{n-1}{n}x_i - \sum_{j\ne i}\frac{1}{n}x_j$$ is a linear combination of independent normal variates with expectation $$\mathbb{E}[x_i - \bar{x}] = \frac{n-1}{n}\mathbb{E}[x_i] -\sum_{j\ne i}\frac{1}{n}\mathbb{E}[x_j] = 0.$$ Writing the common variance of the $x_i$ as $\sigma^2$, $$\mathbb{E}[\left(x_i -\bar{x}\right)^2]=\text{Var}(x_i - \bar{x}) = \left(\frac{n-1}{n}\right)^2\text{Var}(x_i) + \sum_{j\ne i}\left(\frac{1}{n}\right)^2\text{Var}(x_j) = \frac{n-1}{n}\sigma^2.$$ The assumption of anisotropy is that the $y_j$ have the same distribution as the $x_i$ and are independent of them, so an identical result holds for the distribution of $(y_j - \bar{y})^2$. This establishes the constant of proportionality: the squares of the distances have a chi-squared distribution with two degrees of freedom, scaled by $\frac{n-1}{n}\sigma^2$.
The most severe test of these equations is the case $n=2$, for then the fraction $\frac{n-1}{n}$ differs the most from $1$. By simulating the experiment, both for $n=2$ and $n=40$, and overplotting the histograms of squared distances with the scaled chi-squared distributions (in red), we can verify this theory.
Each row shows the same data: on the left the x-axis is logarithmic; on the right it shows the actual squared distance. The true value of $\sigma$ for these simulations was set to $1$.
These results are for 100,000 iterations with $n=2$ and 50,000 iterations with $n=40$. The agreements between the histograms and chi-squared densities are excellent.
Although $\sigma^2$ is unknown, it can be estimated in various ways. For instance, the mean squared distance should be $\frac{n-1}{n}\sigma^2$ times the mean of $\chi^2_2$, which is $2$. With $n=40$, for example, estimate $\sigma^2$ as $\frac{40}{39}/2$ times the mean squared distance. Thus an estimate of $\sigma$ would be $\sqrt{40/78}$ times the RMS distance. Using values of the $\chi^2_2$ distribution we can then say that:
Approximately 39% of the distances will be less than $\sqrt{39/40}\hat\sigma$, because 39% of a $\chi^2_2$ distribution is less than $1$.
Approximately 78% of the distances will be less than $\sqrt{3}$ times $\sqrt{39/40}\hat\sigma$, because 78% of a $\chi^2_2$ distribution is less than $3$.
And so on, for any multiple you care to use in place of $1$ or $3$. As a check, in the simulations for $n=40$ plotted previously, the actual proportions of squared distances less than $1, 2, \ldots, 10$ times $\frac{n-1}{n}\hat\sigma^2$ were
The theoretical proportions are
The agreement is excellent.
Here is
R
code to conduct and analyze the simulations.