Monte Carlo Simulation – How Does Monte Carlo Simulation Determine the Number of Principal Components in PCA?

eigenvaluesneuroimagingparallel-analysispca

I'm doing a Matlab analysis on MRI data where I have performed PCA on a matrix sized 10304×236 where 10304 is the number of voxels (think of them as pixels) and 236 is the number of timepoints. The PCA gives me 236 Eigenvalues and their related coefficients. This is all fine. However when it comes time to decide how many components to retain, the paper I am replicating says the following (please let me know if any clarification is needed as this is just a short part of the whole paper):

We then performed Monte Carlo simulations to determine the number of
principal components (PCs) to extract from the nuisance ROI
data for each scan. A null distribution of the expected eigenvalues
was generated separately for the encoding and rest data for each
subject by performing PCA on normally distributed data of equal
rank to the encoding and rest nuisance ROI data. PCs from the
true nuisance ROI data were then selected for a given rest or
encoding scan if their associated eigenvalues exceeded the 99th
confidence interval of the eigenvalues from the Monte Carlo
simulations.

Tambini & Davachi, PNAS 2013, Persistence of hippocampal multivoxel patterns into postencoding rest is related to memory.

I have absolutely no idea what to do here. I am used to choosing components based off of cumulative variance explained. My thinking is this, though:

We then performed Monte Carlo simulations to determine the number of
principal components (PCs) to extract from the nuisance ROI
data for each scan.

Monte Carlo sims just mean to do the following 1000 (or such) times, right?

A null distribution of the expected eigenvalues
was generated by performing PCA on normally distributed data of equal
rank to the encoding and rest nuisance ROI data.

Firstly, I'm assuming 'equal rank' will basically mean that I will create a matrix the same size as the original (10304×236). In terms of 'normally distributed data of equal rank'…does this mean I should create a 10304×236 matrix of random numbers from the normal distribution? Matlab has a function called 'normrnd' that does this but requires a mu and sigma input. Would I use the same mu and sigma as those derived from the initial dataset? Is this more or less what is meant by 'expected eigenvalues' as I have no idea of what a distribution of EXPECTED eigenvalues would look like.

I guess my problem is more or less that I don't know how to make a 'null distribution' of eigenvalues.

Best Answer

A related term to this question is "Parallel Analysis".

In simple terms, the monte carlo simulation would generate 1000 (or such) 10304x236 matrices of random normally distributed data (this assumes, of course, that the data you analyzing are normally distributed; if your data were distributed differently, you'd use a different random distribution). You would then extract the eigenvalues for each data set you created, and average each eigenvalue across all 1000 (or such) replications while also creating confidence intervals. You then compare the eigenvalues from your data set to the average eigenvalues from your simulation.

Wherever the eigenvalues from your dataset exceed the 99th confidence interval of the eigenvalues from the monte carlo simulation, that's how many factors the analysis would suggest to retain.

For example, if the 25th eigenvalue from your data is 2.10 and the 26th is 1.97, and the 99th confidence interval of the 25th eigenvalues from the 1000 (or such) random data sets is 2.04 and the 26th is 2.01, this would suggest that you retain 25 components.

There are functions built to do this for you. One link for Matlab is this:

http://www.mathworks.com/matlabcentral/fileexchange/44996-parallel-analysis--pa--to-for-determining-the-number-of-components-to-retain-from-pca/content/pa_test.m

I found that one by googling "Parallel Analysis in Matlab".

Related Question