MATLAB: Generating function handles with large numbers of variable coefficients

MATLABoptimization

I've come across a number of types of optimization process which require a parameter search in a high-dimensional space where the number of coefficients to be fit is based on a dataset.
Example: performing PCA on a set of 4k images. The SVD method does not work because the matrix would be ~8 petabytes.
In this context one normalizes each picture, subtracts their average, and renormalizes these difference images. Those images are eigenvectors. One then maximizes the function
sum(a(n)*I(n)).^2
such that
sum(a(n).^2)=1
This requires a function handle of the type
y = @(b,I) b(1)*I(:,:,1)+b(2)*I(:,:,2)+ .. +b(n)*I(:,:,n);
Along with the function to be minimized,
OLS = @(b) (2-sum(sum((y(b)))).^2); % Minimize this to maximize norm(y(b))
Currently i need to write 'y' out explicitly.
Now the question: is there a way to express this in terms of matrix multiplication without explicitly writing it out?
Thank you,
Matt

Best Answer

You're using release R2018a, so you can take advantage of implicit expansion.
I = reshape(1:60, [3 4 5]);
b = [1 2 3 4 5];
R = reshape(b, [1, 1, numel(b)]);
S = sum(I.*R, 3);
The size of I is [3 4 5] and the size of R is [1 1 5] so the product I.*R has size [3 4 5]. Summing that product in the third dimension results in a matrix of size [3 4]. You can compare this with the result of explicitly writing out the multiplication.
S2 = b(1)*I(:,:,1)+b(2)*I(:,:,2)+ b(3)*I(:, :, 3)+b(4)*I(:, :, 4) +b(5)*I(:,:,5);
S-S2
Unfortunately the ability of the sum function to operate on multiple dimensions at a time was introduced in release R2018b, so you can't use that to simplify your OLS function. But if you could upgrade:
OLS = 2-sum(S, 'all').^2
Or combining the expressions together:
OLS = 2-sum(I.*R, 'all').^2
You don't have to define a separate variable with the reshaped b. I did that for clarity of the example.
But going back to your original question, the problem you're trying to solve is taking the pca of a matrix that's larger than can fit in memory, correct? Try storing your data as a tall array and calling pca on that tall array. The documentation for pca in release R2018a says it supports some of the syntaxes for pca on tall arrays.