Solved – Is Hellinger transformation suitable to access species abundance – environmental variable relationships

data transformationhellingermultivariate analysisrvegan

While working with a multivariate dataset, I noticed that Hellinger transformation reveals relationships with the dataset I did not see otherwise. The fact that a transformation affects species abundance – environmental variable relationships is, of course, nothing new. However, I guess I am a bit skeptical about Hellinger transformation based relationships since I do not completely understand how the data are transformed. OK, the equation is simple enough (stated for example here), but the transformation itself is not as intuitive as for example logarithm transformation. Here is an example using a subset of my dataset in R (looking at the plot should be enough to answer this question, but I paste a reproducible example in case someone wants it):

Copy the dataset from here and assign to x

x$sp1.hel <- x$sp1.raw
x$sp2.hel <- x$sp2.raw

x$sp1.log <- log(x$sp1.raw + 1)
x$sp2.log <- log(x$sp2.raw + 1)

library(vegan)

x[c("sp1.hel", "sp2.hel")] <- vegan::decostand(x[c("sp1.hel", "sp2.hel")], method = "hellinger")

library(reshape2)

mx <- reshape2::melt(x, id = "envvar")

mx$trans <- factor(sapply(strsplit(as.character(mx$variable), "\\."), "[", 2))

levels(mx$trans) <- c("Hellinger", "Log + 1", "None")

mx$species <- factor(sapply(strsplit(as.character(mx$variable), "\\."), "[", 1))

library(ggplot2)

ggplot(mx, aes(x = envvar, y = value, color = species)) + 
  geom_point(size = 1, alpha = 0.2) +
  geom_smooth() +
  facet_grid(trans~species, scales = "free") +
  theme_bw()

enter image description here

As you can see, there is probably no relationship between sp1 abundance and envvar in non-transformed or log + 1 transformed data, but such relationship becomes apparent for Hellinger transformed data. Sp2 seems to somewhat positively correlate with envvar regardless the transformation (see cor(x)).

If I was to Hellinger transform my dataset for a constrained multivariate ordination (RDA / CCA), I would probably end up concluding that sp1 and envvar have a rather strong relationship, even though such relationship is not evident for non-transformed or log-transformed data.

Therefore I am wondering whether Hellinger transformation is a suitable method to examine species community – environmental variable relationships in constrained multivariate ordinations?

Best Answer

The Hellinger transformation of vector of abundances is

$$ y_{ij}^{\prime} = \sqrt{\frac{y_{ij}}{y_{i+}}} $$

where I'm using $y_{i+}$ to indicate the sample total count over all $j = 1, \dots, m$ species, for the $i$th sample.

The first part, the fraction, turns abundances into proportional values out of the sample total count. In other words we are now very much thinking about relative species composition, not the magnitudes of abundance values.

The square root element is a commonly-applied transformation for moderately skewed data. Here it is reducing the effect of those $y_{ij}$ values that may be extremely large. It is similar in effect to the log transformation but not as strong a transformation.

The way it is presented in the linked paper (Legendre & Gallagher, 2001) is very much in the context of applying certain linear methods to transformed data such that "non-linear" analyses can be performed.

I would argue that your conclusion re sp1 is not self evident from the plot you show; to my eye there is a tendency for larger abundances towards the middle of envvar. But that is totally irrelevant in the context of the Hellinger transformed data because now we must consider the relative composition and hence we must consider sp2 and it is clear that this species increases in abundance with increasing values of envvar and, assuming that sp1 has no relationship with envvar at all, sp2's relative contribution to the composition of the community must increase, and therefore that of sp1 must decrease, as a function of increasing envvar. This is consistent with the panels containing Hellinger transformed data that you show.

Related Question