Square Root Transformation – Benefits for Count Data

count-datadata transformationgeneralized linear modelpoisson distributionvariance-stabilizing

It is often recommended to take the square root when you have count data. (For some examples on CV, see @HarveyMotulsky's answer here, or @whuber's answer here.) On the other hand, when fitting a generalized linear model with a response variable distributed as Poisson, the log is the canonical link. This is sort of like taking a log transformation of your response data (although more accurately it is taking a log transformation of $\lambda$, the parameter that governs the response distribution). Thus, there is some tension between these two.

  • How do you reconcile this (apparent) discrepancy?
  • Why would the square root be better than the logarithm?

Best Answer

The square root is approximately variance-stabilizing for the Poisson. There are a number of variations on the square root that improve the properties, such as adding $\frac{3}{8}$ before taking the square root, or the Freeman-Tukey ($\sqrt{X}+\sqrt{X+1}$ - though it's often adjusted for the mean as well).

In the plots below, we have a Poisson $Y$ vs a predictor $x$ (with mean of $Y$ a multiple of $x$), and then $\sqrt{Y}$ vs $\sqrt{x}$ and then $\sqrt{Y+\frac{3}{8}}$ vs $\sqrt{x}$.

enter image description here

The square root transformation somewhat improves symmetry - though not as well as the $\frac{2}{3}$ power does [1]:

enter image description here

If you particularly want near-normality (as long as the parameter of the Poisson is not really small) and don't care about/can adjust for heteroscedasticity, try $\frac{2}{3}$ power.

The canonical link is not generally a particularly good transformation for Poisson data; log zero being a particular issue (another is heteroskedasticity; you can also get left-skewness even when you don't have 0's). If the smallest values are not too close to 0 it can be useful for linearizing the mean. It's a good 'transformation' for the conditional population mean of a Poisson in a number of contexts, but not always of Poisson data. However if you do want to transform, one common strategy is to add a constant $y^*=\log(y+c)$ which avoids the $0$ issue. In that case we should consider what constant to add. Without getting too far from the question at hand, values of $c$ between $0.4$ and $0.5$ work very well (e.g. in relation to bias in the slope estimate) across a range of $\mu$ values. I usually just use $\frac12$ since it's simple, with values around $0.43$ often doing just slightly better.

As for why people choose one transformation over another (or none) -- that's really a matter of what they're doing it to achieve.

[1]: Plots patterned after Henrik Bengtsson's plots in his handout "Generalized Linear Models and Transformed Residuals" see here (see first slide on p4). I added a little y-jitter and omitted the lines.

Related Question