MATLAB: Generating function handles with large numbers of variable coefficients

MATLABoptimization

I've come across a number of types of optimization process which require a parameter search in a high-dimensional space where the number of coefficients to be fit is based on a dataset.

Example: performing PCA on a set of 4k images. The SVD method does not work because the matrix would be ~8 petabytes.

In this context one normalizes each picture, subtracts their average, and renormalizes these difference images. Those images are eigenvectors. One then maximizes the function

sum(a(n)*I(n)).^2

such that

sum(a(n).^2)=1

This requires a function handle of the type

y = @(b,I) b(1)*I(:,:,1)+b(2)*I(:,:,2)+ .. +b(n)*I(:,:,n);

Along with the function to be minimized,

OLS = @(b) (2-sum(sum((y(b)))).^2);          % Minimize this to maximize norm(y(b))

Currently i need to write 'y' out explicitly.

Now the question: is there a way to express this in terms of matrix multiplication without explicitly writing it out?

Thank you,

Matt

Best Answer

You're using release R2018a, so you can take advantage of implicit expansion.

I = reshape(1:60, [3 4 5]);
b = [1 2 3 4 5];
R = reshape(b, [1, 1, numel(b)]);
S = sum(I.*R, 3);

The size of I is [3 4 5] and the size of R is [1 1 5] so the product I.*R has size [3 4 5]. Summing that product in the third dimension results in a matrix of size [3 4]. You can compare this with the result of explicitly writing out the multiplication.

S2 = b(1)*I(:,:,1)+b(2)*I(:,:,2)+ b(3)*I(:, :, 3)+b(4)*I(:, :, 4) +b(5)*I(:,:,5);
S-S2

Unfortunately the ability of the sum function to operate on multiple dimensions at a time was introduced in release R2018b, so you can't use that to simplify your OLS function. But if you could upgrade:

OLS = 2-sum(S, 'all').^2

Or combining the expressions together:

OLS = 2-sum(I.*R, 'all').^2

You don't have to define a separate variable with the reshaped b. I did that for clarity of the example.

But going back to your original question, the problem you're trying to solve is taking the pca of a matrix that's larger than can fit in memory, correct? Try storing your data as a tall array and calling pca on that tall array. The documentation for pca in release R2018a says it supports some of the syntaxes for pca on tall arrays.

Related Solutions

MATLAB: Any suggestions as to vectorization of a multivariate linear regression to several “univariate” linear regressions column-by-column of the covariate matrix

If the question is to solve the zero-intercept LSQ model of first order, then

bi = sum(x.*y)/sum(x.*x)

which is easily vectorized as

b=sum(bsxfun(@times,x,y))./sum(x.*x);

With later releases, the automagic operator expansion may work to eliminate the explicit bsxfun; I have R2012b so can't test.

MATLAB: What proportion of students failed the course (i.e., got a mark of less than 50)

I get 63, not 64:

x = [83 58 60 66 73 71 89 77 67 48 54 51 64 69 71 75 52 76 63 88 63 78 66 75 95 32 79 34 72 54 65 75 59 55 62 58 47 65 87 49 91 86 64 100 46 69 57 85 65 63 59 100 55 70 43 45 69 85 75 69 73 74 75 67 68 79 62 54 84 59 65 86 56 52 61 85 64 50 64 50 71 77 66 46 78 73 59 87 53 69 53 62 73 37 70 54 51 80 43 67 65 74 100 86 68 59 83 45 62 57 84 51 48 84 85 92 65 65 42 63 68 83 90 67 68 81 77 86 100 84 76 67 60 51 66 64 76 88 37 58 49 58 86 72 71 66 79 76 46 54 72 59 79 64 81 65 69 41 94 51 53 75 52 71 85 69 46 57 82 70 97 64 52 56 40 65 65 59 69 59 70 74 94 63 83 66 82 80 75 55 47 73 58 55 67 84 77 58 73 58 65 94 67 44 59 58 58 54 62 61 77 97 52 67 79 76 75 67 50 76 85 90 76 86 76 68 72 90 38 52 49 74 62 59 69 51 83 64 98 100 86 50 70 71 86 50 85 94 86 53 74 55 74 77 66 71 58 76 70 77 60 78 70 71 56 91 73 57 53 75 57 93 66 64 52 88 54 68 78 57 90 85 71 97 59 73 68 100 59 73 69 79 58 99 69 64 71 77 59 55 61 62 62 68 80 82 80 88 90 80 52 30 78 57 80 70 54 73 100 82 73 63 83 63 88 62 37 70 72 85 58 64 59 43 59 91 72 68 93 46 51 94 62 67 66 75 65 92 88 73 67 51 91 65 61 68 60 75 57 66 63 90 56 42 53 78 84 74 55 78 92 78 73 64 92 81 90 77 48 84 72 64 74 58 49 85 76 69 63 59 69 61 58 69 64 72 62 79 60 59 66 57 81 81 75 73 82 63 97 54 54 66 79 84 64 84 59 64 62 62 83 82 65 77 60 82 74 56 64 73 98 83 51 81 56 60 85 52 79 70 84 65 31 80 69 54 50 61 53 83 64 100 38 68 90 89 72 51 100 96 63 41 63 63 72 75 59 62 65 55 56 84 62 55 61 79 53 86 54 51 73 80 50 75 56 75 39 75 68 68 70 79 78 64 93 56 75 95 89 65 65 66 76 40 71 68 68 67 88 65 71 70 65 70 52 55 81 74 41 91 76 51 100 54 47 70 58 49 75 71 49 82 87 96 82 81 53 57 67 53 94 67 50 64 75 68 97 74 68 63 61 59 65 80 47 63 77 85 80 53 48 96 62 57 86 76 80 74 57 80 70 69 75 80 67 73 66 48 90 73 72 74 66 56 63 70 76 57 70 67 78 77 64 67 75 62 72 76 78 69 74 61 64 63 96 76 51 81 49 93 66 66 71 67 62 62 77 90 85 72 66 75 75 77 65 60 54 78 100 97 82 52 65 97 74 93 100 87 65 74 54 63 79 64 50 73 74 45 71 77 86 81 68 35 57 90 70 79 67 91 70 85 92 61 82 80 92 73 62 83 61 60 61 92 34 80 77 43 70 83 91 73 60 63 81 67 55 76 72 90 53 59 77 74 65 78 85 63 80 99 84 61 86 79 72 78 54 85 68 97 66 67 80 77 68 56 72 74 71 81 61 45 88 62 74 84 40 78 57 69 63 75 71 66 82 70 92 58 71 52 82 97 89 65 74 63 79 64 90 61 55 52 67 80 82 49 77 76 57 51 88 61 65 70 75 66 62 100 75 71 80 53 60 73 98 54 76 92 100 68 76 72 55 93 66 89 54 62 85 62 68 72 63 88 53 58 69 51 82 84 70 77 69 64 94 70 47 95 60 69 60 62 67 94 56 69 77 78 64 97 58 64 58 67 78 54 62 88 75 58 52 69 80 69 89 64 80 67 76 74 40 59 92 68 84 46 30 75 66 73 55 57 79 72 62 58 65 75 82 82 75 74 73 59 59 98 100 71 60 77 44 64 96 69 59 44 74 58 97 71 69 68 60 67 74 83 90 81 73 77 53 79 75 69 88 75 89 71 71 100 83 67 74 42 52 56 75 76 61 76 69 83 50 63 78 96 43 67 60 47 82 82 76 69 54 67 43 100 68 63 83 96 66 76 72 79 54 66 68 83 63 36 46 82 56 53 77 58 83 59 73 100 78 60 64 85 70 76 64 81 83 63 65 83 66 80 65 83 99 66 56]'
failCount = sum(x < 50)
failRatio = failCount / numel(x)

I see

failCount =
    63
failRatio =
        0.0648815653964985

Best Answer

Related Solutions

MATLAB: Any suggestions as to vectorization of a multivariate linear regression to several “univariate” linear regressions column-by-column of the covariate matrix

MATLAB: What proportion of students failed the course (i.e., got a mark of less than 50)

Related Question