I have four discrete categories, each category has a sample count, and I'd like some measure of variance, where minimum variance is counts are evenly divided among all four categories and max variance is all counts are in one category and the other three have zero. Is there any standard measure or calculation that does something like this?
[Math] Variance measure for categorical data
discrete mathematicsorder-statisticsstatistics
Best Answer
You can use the total sum of square, http://en.wikipedia.org/wiki/Total_sum_of_squares
This is a measure of how far the data points are from the average of them.
If you are looking for a simple and intuitive way, you can just find the sum of the difference from the average of count.
R = |(count A-average)|+|(count B-average)|+|(count C-average)|+|(count D-average)|
For example: total count = 100, average of count = 100/4 = 25
If count = (25, 25, 25, 25) then R = |(25-25)|+|(25-25)|+|(25-25)|+|(25-25)|=0
If count = (0, 100, 0, 0) , then R = |(0-25)|+|(100-25)|+|(0-25)|+|(0-25)|=150
If count = (90, 10, 0, 0), then R = |(90-25)|+|(10-25)|+|(0-25)|+|(0-25)|=80
Note: It is basically the same thing, just not squaring the difference.