The Hellinger transformation of vector of abundances is
$$
y_{ij}^{\prime} = \sqrt{\frac{y_{ij}}{y_{i+}}}
$$
where I'm using $y_{i+}$ to indicate the sample total count over all $j = 1, \dots, m$ species, for the $i$th sample.
The first part, the fraction, turns abundances into proportional values out of the sample total count. In other words we are now very much thinking about relative species composition, not the magnitudes of abundance values.
The square root element is a commonly-applied transformation for moderately skewed data. Here it is reducing the effect of those $y_{ij}$ values that may be extremely large. It is similar in effect to the log transformation but not as strong a transformation.
The way it is presented in the linked paper (Legendre & Gallagher, 2001) is very much in the context of applying certain linear methods to transformed data such that "non-linear" analyses can be performed.
I would argue that your conclusion re sp1
is not self evident from the plot you show; to my eye there is a tendency for larger abundances towards the middle of envvar
. But that is totally irrelevant in the context of the Hellinger transformed data because now we must consider the relative composition and hence we must consider sp2
and it is clear that this species increases in abundance with increasing values of envvar
and, assuming that sp1
has no relationship with envvar
at all, sp2
's relative contribution to the composition of the community must increase, and therefore that of sp1
must decrease, as a function of increasing envvar
. This is consistent with the panels containing Hellinger transformed data that you show.
I have been searching for answers to this question too. I came across this very useful discussion from from years ago:
[ORDNEWS:1593] log, sqrt and other transformation with Bray-Curtis dissimilarity
The purpose of using a sqrt transformation seems to be to reduce the relative influence of the most frequent species, which otherwise will tend to dominate the dissimilarity matrix, and also are often quite variable in number (according to the discussion). Furthermore we may be somewhat more interested in the rarer species. An even stronger downweighting can be achieve using log(1+x).
The Wisconsin scaling removes the effect of absolute species abundance and also abundance between sites, so everything becomes relative.
The Bray-Curtis measure outperforms other measures in many cases, and only compares species that are present at one of the sites, which means that double zeros are (correctly) ignored.
I am thinking that the default scaling of metaMDS is likely to be well founded, I just wish it was a bit more transparent.
Best Answer
The Hellinger transformation is defined as
$$ y^{\prime}_{ij} = \sqrt{\frac{y_{ij}}{y_{i.}}} $$
Where $j$ indexes the species, $i$ the site/sample, and $i.$ is the row sum for the $i$th sample.
If your data are already of the form $\frac{y_{ij}}{y_{i.}}$, but you've only taken a subset of the species, then yes, you can just apply a square root transformation to the data you are using and it would have been the same if you'd done the entire Hellinger transformation on the entire data set and then thrown out some of the species.
If you have a large number of taxa, in my experience I have found applying the Hellinger transformation (or just the square root to already proportional abundance data) to be an improvement over and above just analysing the % (or proportional) abundance data.