Solved – What do ellipses of PCA analysis (factoextra) mean

biostatisticspcar

Biological question: do plants that have been attacked by insects differ in their characteristics from plants that have not been attacked by insects?

Data frame includes six numerical variables describing different plant traits and one categorical defining plant status (with insects/ withouth insects)

To answer this I performed a PCA analysis in R using the "factoextra" package. I plotted individuals with "fviz_pca_ind" function separated by the category "with insects" and "without insects" and included ellipses.

My problem: what does the argument "ellipse.level = 0.68" from "fviz_pca_ind" function mean? Confidence intervals? Standard deviations? Or simply percentage of individuals inside the ellipse?

library(factoextra)
res.pca <- prcomp(PCA2, scale = TRUE)
treat <- as.factor(treat) #categorical variable (with insects, without insects)

fviz_pca_ind(res.pca, pointsize=2, geom="point", palette = c("palegreen3","black") , habillage = treat, addEllipses = TRUE, ellipse.level = 0.68) +
theme_minimal()enter code here

enter image description here

My next question is: what analyses would you recommend me to show whether these two groups are statistically different? (in this case they are not, but anyway) Thanks!

Best Answer

Welcome to the site AndresD.

I would suggest you instead use the vegan package using the rda function to build your PCA, as this package has the ordihull function that can produce your group centroids - here is a tutorial. The documentation for ordihull is a little clearer than the factoextra package from what I can see and details that the ellipsoid is defined by a 95% confidence interval and you can manually change this if desired. The second reason I point you to vegan is the adonis2 function (don't use adonis, it is deprecated). This allows you to carry out a permutational multivariate analysis of variance (PERMANOVA) that will allow you to test if the two group centroids are statistically different from one another. Essentially this test will take your PCA object and then allow you to test whether the two groups are statistically different in n-dimensional space (i.e. do their traits differ). It does carry with it the assumption of homogeneity of multivariate dispersions, which is analogous to the equality of variance assumption that (univariate) linear models holds; you can test this using the vegan::betadisper function. The data you showed likely violates this assumption though and probably aren't different as you pointed out...

Let me know if anything is unclear.