Solved – Alternatives to ratios of counts if denominators can be zero

count-datadata transformation

I have been given a bunch of data and asked to calculate proportions between two variables. The variables are both counts (this is social science stuff).

The problem is that there are a number of zeroes dotted althrough my dataset which of course present a problem when calculating proportions.

Is there a way that I can transform the data to get rid of the zeros? I'm not going to analyze it, so I don't need it to satisfy any specific assumptions.

Best Answer

When reporting ordered or graded scales, working with simple descriptive summaries like

% improved $−$ % deteriorated

or

% ranking as good $−$ % ranking as bad

is sometimes helpful. In such summaries, omitting any neutral or middle category is common (but not essential). Clearly, such a measure gives the preponderance of two tails: if everybody improved, we get $100$, and, if everybody got worse, we get $−100$.

In political terms, an election could be imagined in which there are votes “for” and “against” from these two categories, and from that context, these measures may be described as plurality measures. (Is there a better general term, or any term that is standard in some field, for particular examples of such measures?) Whatever the terminology, such measures are discussed in Tukey (1977, pp.498–502), Zeisel (1985, pp.75–77), and Wilkinson (2005, pp.57–58).

Naturally, the percent formulation is not compulsory, and you could just as easily — in fact, a little more easily — work with proportions or fractions with results ranging from $1$ to $−1$. In either case, using a difference is natural whenever thinking is in terms of the percent or proportion scale being used. Also, a ratio such as

% ranking as good / % ranking as bad

may be less desirable with small denominators. Either the result may be unstable, or, if the denominators are ever 0, it may be indeterminate.

Let us illustrate both points with the idea of looking at gender roles across a set of activities, and

% who are female $−$ % who are male

as a way of summarizing data on who does what. If, in a village, 21 women and zero men do laundry, four men and 11 women fetch water, and 14 men and zero women take care of cows, then neither the male–female ratio nor the female–male ratio can be used throughout to summarize the balance of the sexes. Whenever zero is a denominator, the ratio is indeterminate. Even if no zeros are present, we should worry about sensitivity. However, the measure above is one which is always practical.

All that said, it should be evident that the raw frequencies remain important and should be reported, or at least easily accessible. "1/3 of the cats showed improvement, 1/3 deterioration, but the other cat ran away" has an equivalent here too.

The details should be simple in your favourite software, but for Stata details see Cox (2007) on which this is based.

Cox, N.J. 2007. How do I calculate measures such as percent improved minus percent deteriorated? http://www.stata.com/support/faqs/data-management/plurality-measures/

Tukey, J. W. 1977. Exploratory Data Analysis. Reading, MA: Addison–Wesley.

Wilkinson, L. 2005. The Grammar of Graphics. 2nd ed. New York: Springer.

Zeisel, H. 1985. Say It with Figures. 6th ed. New York: Harper & Row.

Related Question