I have species relative abundance data (as percentages) and several environmental parameters- and I have done normality tests on my data and it all seems to be normally distributed, but do I need to log transform the data anyway? I saw an online tutorial for CCA and it said to, but I would like to be sure.
Solved – Do I always need to log transform the data to do a canonical correspondence analysis
canonical-correlationcorrespondence-analysismultivariate analysis
Related Solutions
I don't think that using CCA will help you. It appears to me that you have a number of endogenous series ( abundance of species n in number ) and a number of exogenous series ( variety of food resources m in number ). I would suggest constructing n transfer functions each one optimized to fully utilize the information content in the m supporting series and their lags if appropriate while incorporating and unspecified stochastic structure with ARMA and unspecified deterministic structure like Level Shifts/Local Time Trends etc.. Having these n equations unser a "statistical microscope" might illuminate "commonalities" suggesting further grouping of the n equations into subsets.
The Hellinger transformation of vector of abundances is
$$ y_{ij}^{\prime} = \sqrt{\frac{y_{ij}}{y_{i+}}} $$
where I'm using $y_{i+}$ to indicate the sample total count over all $j = 1, \dots, m$ species, for the $i$th sample.
The first part, the fraction, turns abundances into proportional values out of the sample total count. In other words we are now very much thinking about relative species composition, not the magnitudes of abundance values.
The square root element is a commonly-applied transformation for moderately skewed data. Here it is reducing the effect of those $y_{ij}$ values that may be extremely large. It is similar in effect to the log transformation but not as strong a transformation.
The way it is presented in the linked paper (Legendre & Gallagher, 2001) is very much in the context of applying certain linear methods to transformed data such that "non-linear" analyses can be performed.
I would argue that your conclusion re sp1
is not self evident from the plot you show; to my eye there is a tendency for larger abundances towards the middle of envvar
. But that is totally irrelevant in the context of the Hellinger transformed data because now we must consider the relative composition and hence we must consider sp2
and it is clear that this species increases in abundance with increasing values of envvar
and, assuming that sp1
has no relationship with envvar
at all, sp2
's relative contribution to the composition of the community must increase, and therefore that of sp1
must decrease, as a function of increasing envvar
. This is consistent with the panels containing Hellinger transformed data that you show.
Best Answer
CCA is sensitive to outliers and assumes species response is a symmetrical unimodal function of position along environmental gradients. Hypothesis testing is based on randomization, so does not have distributional assumptions. But, CCA or not, transformations should be applied only if they improve data distribution (demonstrated using normality tests or PPCC fit).