Solved – Grouping variables with small sample size

clusteringfactor analysishierarchical clusteringsmall-sample

I have a data set with about 50 variables but only 10 observational units. This is due to the underlying science, it's difficult and expensive to increase the sample size. I think a large number of the variables are highly correlated, in terms of factor analysis I think there are only maybe 5-10 underlying factors. I want to group the variables based on their commonality (covariance or something similar). Is there a way to do this?

My first thoughts were factor analysis and hierarchical clustering. For factor analysis the sample size is obviously a huge problem. Similarly for hierarchical clustering I simulated data in R with small n and a large number of variables in such a way that I knew which ones were similar but with such small sample sizes the clusters are very unstable from one simulation to the next and are often spurious.

In short, is there a good way to group variables when your sample size is smaller than your number of variables?

Best Answer

One of the simplest approaches is hierarchical variable clustering using a similarity matrix. By default I take the similarity matrix to be made up of squared Spearman rank correlation coefficients. This is implemented in the R Hmisc package varclus function. I highly recommend sparse principal components as an alternative to this, which does clustering and scoring together. There are at least two R packages for this.