Solved – How to compare proportions across different groups with varying population sizes

populationprobabilityrsurvival

I am trying to understand what statistical measures I can use to compare three groups having varying populations to understand which group is bad (highest probability of death or most vulnerable or weakest or whatever measure of strength is suitable) and which group is good.

Total Population
------------------
Group A: 100
Group B: 150
Group C: 50
Group D: 900

Over the course of one year, I have recorded the following information:

Group    Deaths
----------------
A        40
B        60
C        20
D        360

I was thinking I could just calculate the probability of deaths by diving the numbers in the second table by the numbers in the first. This made sense based on the definition of probability. However, I am not sure if there is a better way to do this. Group D had almost a population of 900 which makes me think that the confidence in estimating the probability is higher whereas for Group C, we don't really have enough data to make any conclusions.

In this case, Group C and Group D had population sizes differing the order of hundreds yet we get the same probability of death for both groups leading us to conclude that Group C and Group D have the same properties (whatever they are). How do I deal with this problem (if it is a problem in the first place)? Is there another level of normalization (besides dividing a number by its group population)?

Note: If it helps, I have data on when each individual died (approximate time) in the following form:

Death_Date Group
------------------
2011-01-23 Group A
2011-01-23 Group A
2011-01-25 Group A
...
2011-01-25 Group C
...

Best Answer

From what you are describing intuitively about having higher confidence in large populations, I think what you might like is calculating the confidence intervals on the probability of death for each group. This way toy should get a narrower confidence interval for Group D. You can chose a significance level that you want. If you want to model the time dependent effects as you mentioned in your update, then may be you want to look at survival analysis.