[Math] Variance measure for categorical data

discrete mathematicsorder-statisticsstatistics

I have four discrete categories, each category has a sample count, and I'd like some measure of variance, where minimum variance is counts are evenly divided among all four categories and max variance is all counts are in one category and the other three have zero. Is there any standard measure or calculation that does something like this?

Best Answer

You can use the total sum of square, http://en.wikipedia.org/wiki/Total_sum_of_squares

This is a measure of how far the data points are from the average of them.


If you are looking for a simple and intuitive way, you can just find the sum of the difference from the average of count.

R = |(count A-average)|+|(count B-average)|+|(count C-average)|+|(count D-average)|

For example: total count = 100, average of count = 100/4 = 25

If count = (25, 25, 25, 25) then R = |(25-25)|+|(25-25)|+|(25-25)|+|(25-25)|=0

If count = (0, 100, 0, 0) , then R = |(0-25)|+|(100-25)|+|(0-25)|+|(0-25)|=150

If count = (90, 10, 0, 0), then R = |(90-25)|+|(10-25)|+|(0-25)|+|(0-25)|=80

Note: It is basically the same thing, just not squaring the difference.

Related Question