Solved – Best multivariate polynomial regression

mathematicaMATLABrregression

I have a dataset (x,y) where x is a n-dimensional vector and y is an m-dimensional vector. (m=3, n>2) My goal is to find the best polynomial in x fitting the (x,y) dataset.

The dimension of x is pretty big (right now it is 25), and I don't want to enter manually all the possibilities (ie x1*x3*x5, x1*x4*x6, …). I can use Matlab, Mathematica and R. How can I do this?

Also, I would be interested in hearing your suggestions about the following problem: how can I choose from the result the most relevant coefficients? (maybe x1*x2 is more relevant than x2*x3)

Thank you

Best Answer

In Matlab, let Y be the $k\times 3$ matrix whose rows are observations of $y$, and similarly let X be the $k\times n$ matrix of corresponding $X$ observations. To regress Y against X, try

beta = X \ Y;

The variable beta will now be $n\times 3$ and have the regression coefficients.

This is for a model that is 'linear' in your input $X$ variables. To populate a matrix with the $n(n+1)/2$ pairs of products of the $X$ variables is a bit more tricky without using a bunch of for loops. One could do something like:

Z = bsxfun(@times,X,permute(X,[1,3,2]));  % this is k x n x n
Z = reshape(Z,size(Z,1),[]);              % this is k x (n^2)
o2n = 1:size(X,2);                        % elements 1 to n
keepidx = bsxfun(@ge,o2n,permute(o2n,[1,3,2]));  % keep only if i >= j
keepidx = keepidx(:)';
Z = Z(:,keepidx);  

At the end of this, modulo my programming errors, Z should be $k \times n(n+1)/2$ and have the order 2 powers of elements of X. Then you can try

beta = [X,Z] \ Y;

Variable selection is another matter entirely. As a first pass, BIC seems to work OK for multivariate multiple regression.