While working with a multivariate dataset, I noticed that Hellinger transformation reveals relationships with the dataset I did not see otherwise. The fact that a transformation affects species abundance – environmental variable relationships is, of course, nothing new. However, I guess I am a bit skeptical about Hellinger transformation based relationships since I do not completely understand how the data are transformed. OK, the equation is simple enough (stated for example here), but the transformation itself is not as intuitive as for example logarithm transformation. Here is an example using a subset of my dataset in R (looking at the plot should be enough to answer this question, but I paste a reproducible example in case someone wants it):
Copy the dataset from here and assign to x
x$sp1.hel <- x$sp1.raw
x$sp2.hel <- x$sp2.raw
x$sp1.log <- log(x$sp1.raw + 1)
x$sp2.log <- log(x$sp2.raw + 1)
library(vegan)
x[c("sp1.hel", "sp2.hel")] <- vegan::decostand(x[c("sp1.hel", "sp2.hel")], method = "hellinger")
library(reshape2)
mx <- reshape2::melt(x, id = "envvar")
mx$trans <- factor(sapply(strsplit(as.character(mx$variable), "\\."), "[", 2))
levels(mx$trans) <- c("Hellinger", "Log + 1", "None")
mx$species <- factor(sapply(strsplit(as.character(mx$variable), "\\."), "[", 1))
library(ggplot2)
ggplot(mx, aes(x = envvar, y = value, color = species)) +
geom_point(size = 1, alpha = 0.2) +
geom_smooth() +
facet_grid(trans~species, scales = "free") +
theme_bw()
As you can see, there is probably no relationship between sp1 abundance and envvar in non-transformed or log + 1 transformed data, but such relationship becomes apparent for Hellinger transformed data. Sp2 seems to somewhat positively correlate with envvar regardless the transformation (see cor(x)
).
If I was to Hellinger transform my dataset for a constrained multivariate ordination (RDA / CCA), I would probably end up concluding that sp1 and envvar have a rather strong relationship, even though such relationship is not evident for non-transformed or log-transformed data.
Therefore I am wondering whether Hellinger transformation is a suitable method to examine species community – environmental variable relationships in constrained multivariate ordinations?
Best Answer
The Hellinger transformation of vector of abundances is
$$ y_{ij}^{\prime} = \sqrt{\frac{y_{ij}}{y_{i+}}} $$
where I'm using $y_{i+}$ to indicate the sample total count over all $j = 1, \dots, m$ species, for the $i$th sample.
The first part, the fraction, turns abundances into proportional values out of the sample total count. In other words we are now very much thinking about relative species composition, not the magnitudes of abundance values.
The square root element is a commonly-applied transformation for moderately skewed data. Here it is reducing the effect of those $y_{ij}$ values that may be extremely large. It is similar in effect to the log transformation but not as strong a transformation.
The way it is presented in the linked paper (Legendre & Gallagher, 2001) is very much in the context of applying certain linear methods to transformed data such that "non-linear" analyses can be performed.
I would argue that your conclusion re
sp1
is not self evident from the plot you show; to my eye there is a tendency for larger abundances towards the middle ofenvvar
. But that is totally irrelevant in the context of the Hellinger transformed data because now we must consider the relative composition and hence we must considersp2
and it is clear that this species increases in abundance with increasing values ofenvvar
and, assuming thatsp1
has no relationship withenvvar
at all,sp2
's relative contribution to the composition of the community must increase, and therefore that ofsp1
must decrease, as a function of increasingenvvar
. This is consistent with the panels containing Hellinger transformed data that you show.