MATLAB: Generating function handles with large numbers of variable coefficients


I've come across a number of types of optimization process which require a parameter search in a high-dimensional space where the number of coefficients to be fit is based on a dataset.
Example: performing PCA on a set of 4k images. The SVD method does not work because the matrix would be ~8 petabytes.
In this context one normalizes each picture, subtracts their average, and renormalizes these difference images. Those images are eigenvectors. One then maximizes the function
such that
This requires a function handle of the type
y = @(b,I) b(1)*I(:,:,1)+b(2)*I(:,:,2)+ .. +b(n)*I(:,:,n);
Along with the function to be minimized,
OLS = @(b) (2-sum(sum((y(b)))).^2); % Minimize this to maximize norm(y(b))
Currently i need to write 'y' out explicitly.
Now the question: is there a way to express this in terms of matrix multiplication without explicitly writing it out?
Thank you,

Best Answer

You're using release R2018a, so you can take advantage of implicit expansion.
I = reshape(1:60, [3 4 5]);
b = [1 2 3 4 5];
R = reshape(b, [1, 1, numel(b)]);
S = sum(I.*R, 3);
The size of I is [3 4 5] and the size of R is [1 1 5] so the product I.*R has size [3 4 5]. Summing that product in the third dimension results in a matrix of size [3 4]. You can compare this with the result of explicitly writing out the multiplication.
S2 = b(1)*I(:,:,1)+b(2)*I(:,:,2)+ b(3)*I(:, :, 3)+b(4)*I(:, :, 4) +b(5)*I(:,:,5);
Unfortunately the ability of the sum function to operate on multiple dimensions at a time was introduced in release R2018b, so you can't use that to simplify your OLS function. But if you could upgrade:
OLS = 2-sum(S, 'all').^2
Or combining the expressions together:
OLS = 2-sum(I.*R, 'all').^2
You don't have to define a separate variable with the reshaped b. I did that for clarity of the example.
But going back to your original question, the problem you're trying to solve is taking the pca of a matrix that's larger than can fit in memory, correct? Try storing your data as a tall array and calling pca on that tall array. The documentation for pca in release R2018a says it supports some of the syntaxes for pca on tall arrays.