Regression – Applying Empirical Logit Transformation on Percentage Data

data transformationlogitproportion;regression

I have already used the logit transform on my outcome variables (which are displayed in percentages). However, this obviously gives me -INF values and since my data includes a lot of zeros in some instances, this makes it hard to analyse.

I have now tried an empirical logit transform, adding the smallest non-zero promotion to the numerator and denominator of my variables to remove the -INF values (as suggested in http://www.esajournals.org/doi/abs/10.1890/10-0340.1).

However, now my data are very non-normal again. I have tried experimenting with error terms to add to the logit transform but since have had no luck.

Is there any way I can find a value to add to my transformation to ensure normality?

Best Answer

I've had luck with setting epsilon to half of the smallest non-zero value and replacing all 0 values with epsilon and all 1 values with 1-epsilon. Then apply the logit transformation.

This method keeps the original form of the logit transformation, but allows 1 and 0 to be transformed to values that match the overall shape of the intended transformation (note the black dots in the figure at raw=0 and 1). In particular, it preserves the quality that 0.5 is transformed to 0, and the rest of the values are symmetric.

On the other hand, adding the smallest non-zero value as described in the paper changes the shape of the curve and destroys the symmetry.

Comparing two methods of ways to adjust the logit transformation to deal with zeros