Biological question: do plants that have been attacked by insects differ in their characteristics from plants that have not been attacked by insects?
Data frame includes six numerical variables describing different plant traits and one categorical defining plant status (with insects/ withouth insects)
To answer this I performed a PCA analysis in R using the "factoextra" package. I plotted individuals with "fviz_pca_ind" function separated by the category "with insects" and "without insects" and included ellipses.
My problem: what does the argument "ellipse.level = 0.68" from "fviz_pca_ind" function mean? Confidence intervals? Standard deviations? Or simply percentage of individuals inside the ellipse?
library(factoextra)
res.pca <- prcomp(PCA2, scale = TRUE)
treat <- as.factor(treat) #categorical variable (with insects, without insects)
fviz_pca_ind(res.pca, pointsize=2, geom="point", palette = c("palegreen3","black") , habillage = treat, addEllipses = TRUE, ellipse.level = 0.68) +
theme_minimal()enter code here
My next question is: what analyses would you recommend me to show whether these two groups are statistically different? (in this case they are not, but anyway) Thanks!
Best Answer
Welcome to the site AndresD.
I would suggest you instead use the
vegan
package using therda
function to build your PCA, as this package has theordihull
function that can produce your group centroids - here is a tutorial. The documentation forordihull
is a little clearer than thefactoextra
package from what I can see and details that the ellipsoid is defined by a 95% confidence interval and you can manually change this if desired. The second reason I point you tovegan
is theadonis2
function (don't useadonis
, it is deprecated). This allows you to carry out a permutational multivariate analysis of variance (PERMANOVA) that will allow you to test if the two group centroids are statistically different from one another. Essentially this test will take your PCA object and then allow you to test whether the two groups are statistically different in n-dimensional space (i.e. do their traits differ). It does carry with it the assumption of homogeneity of multivariate dispersions, which is analogous to the equality of variance assumption that (univariate) linear models holds; you can test this using thevegan::betadisper
function. The data you showed likely violates this assumption though and probably aren't different as you pointed out...Let me know if anything is unclear.