[Math] Measuring Variance of 2D Data Points

descriptive statisticsstatistics

I have a list of weighted data points on a 2D plane in the form $(x, y)$. I believe the mean can calculated as $\left(\frac{\sum{x_i\times w_i}}{\sum w_i}, \frac{\sum{y_i\times w_i}}{\sum w_i}\right)$.

What is the best way to calculate a single value which will accurately describe the spread of the data points?

Best Answer

You have computed the weighted mean $$\mathbf{\bar x}_* = \sum w_i\mathbf{x}_i$$ (assume $\sum w_i = 1$ for ease of notation). Then why not use weighted variance $$\sum w_i|\mathbf{x}_i -\mathbf{\bar x}_*|^2$$ or, for an unbiased version, use the unbiased weighted variance $$\frac{1}{1-\sum w_i^2}\sum w_i|\mathbf{x}_i -\mathbf{\bar x}_*|^2$$

Here, $|\cdot|$ denotes the ordinary Euclidean norm $|\mathbf u| = \sqrt{u_1^2+u_2^2}$, so $|\mathbf{ u }|^2 = u_1^2+u_2^2$.

Related Question