[Math] Weighted Standard Deviation for Histogram Bin Height

standard deviationstatistics

I'm plotting some binned data in the form of a histogram. Say I have 10 data points, each composed of a bin to be placed in, and then a "height". Then I might have something like:

Bin Height

0 – 2.2

1 – 1.3

2 – 0.1

0 – 2.4

2 – 0.28

1 – 0.8

0 – 1.8

1 – 1.0

0 – 2.6

0 – 2.2

I want to plot this, with the height of each bin being the sum of the heights of the pieces in each bin (so above, the full height of bin 2 would be 0.38). I'd like to find the standard deviation in the height of a bin. I know that my sample is drawn from a uniform distribution, but set up so that 0 is more likely than 2, since the range in the uniform distribution that corresponds to 0 is wider than that for 2. I know these ranges. The heights aren't generated using the uniform distribution.

Update: how I get the heights – I start off with everyone in some bin, say 0, with each person having height 1. Then through some process, I get probabilities to move each one into another bin:

Bin to move to: Weight to move:

0 1

1 0.4

2 0.2

So then I add these heights up to get 1.6 or something, and use my uniform distribution to move to another bin (or stay in 0, depending on what I get). Then the "height" of the person is 1.6. If I do this update procedure multiple times, the total height is the product of these sums for each step. I want to add up all these total heights for everyone in the bin, and get a standard deviation on that so I'd have something like

0 – 11.2 +/- ??

1 – 3.1 +/- ??

2 – 0.38 +/- ??

Best Answer

Umm, from what you wrote, I think I would just calculate the unbiased estimate of standard deviation of "full heights" of the three bins:

$$s = \sqrt{\frac{1}{N-1} \sum_{i=1}^N (x_i - \overline{x})^2}$$

, your $x_i$ being the "full height" of a bin and $\overline{x}$ being their mean.

But it seems to me you are probably looking for something else. Could you please clarify what you mean by your sample being drawn from uniform distribution? May be also include what you are trying to achieve and what sort of data this is...

PS: I am sorry, but i don't seem to be able to ask for clarification in any other way than by giving some answer.

Related Question