Ecology – Analyzing Community Composition in Relation to Environmental Variables with nMDS

ecologymultidimensional scalingmultivariate analysisr

I have a big data set (over 1000 observations) with abundances of over 60 species at 15 different sites over two years. Each site was divided into 30 sampling points and these were each sampled four times (replicates). I also have environmental data for each site but this data was only measured once so I don't have any replicates as I do with the abundance data.

I want to find out if there is a difference in community composition between sites and how it is related to the environmental data. I will use a Non-Metric Multidimensional Scaling (nMDS).

Question 1: Do I need to test data for normality first? If so, how for this kind of data?

When I tried to run nMDS it took the row numbers as sites and I ended up with over 800 points for sites but I just want to have one point for each pair of site-year.

Question 2: Do I need to average my abundance data for the sampling points and replicates at each site before nMDS?

Question 3: How can I incorporate the environmental data into my nMDS?

Any help would be very appreciated as I am quite confused!

Many thanks,

Best Answer

If you want to produce an NMDS plot with one point for each site, you will first need to pool your sampling points to produce a single community for each site. You could produce separate plots like this for each year, or have them all on the same plot e.g. plot1_year1, plot1_year2 etc...

Alternatively, you could keep your data having one row for each sampling point. You could then plot all of the sampling points, and give each point a colour corresponding to which site it is from. This will allow you to visualise whether sampling points from the same site cluster together. Check out vignettes from the R package vegan for examples of how to do this.

I'm not clear on what the point of the replication was... Maybe just pool your replicates to give a single row per sampling point.

It sounds like sampling intensity was identical between sampling points and sites, but you might want to think about this to make sure.

Once you have some NMDS plots, you can fit your environmental variables on to them using the envfit function. This function can be used to test whether the correlations are significant using permutations - the data does not need to be normal.

If you want to test for effect of specific environmental variables, you will need to take into account spatial autocorrelation - sites that are far apart are likely to differ more in community composition and environmental variables than sites that are close together. To take this into account you can use partial mantel tests. In a similar way to how your community data is transformed into a distance matrix for NMDS, you need to construct a distance matrix for your sites based on geographic distance. The partial mantel test can then partial out the effect of geographic distance to show whether your environmental variables are still important.

You could also carry out exploratory partial mantel analysis, assessing the independent importance of matrices of related environmental variables with effects of other matrices removed. This involves sequentially testing the importance of each variable on community composition once the effects of remaining matrices are partialled out from the analysis.

p.s. short answers to your questions:

Do I need to test for normality first - no

Do I need to average my abundance data for the sampling points and replicates? - Yes, but I would pool (sum) them rather than average

How can I incorporate the environmental data into my nMDS? - Use envfit and partial mantel tests

Related Question