Solved – Best multivariate polynomial regression

mathematicaMATLABrregression

I have a dataset (x,y) where x is a n-dimensional vector and y is an m-dimensional vector. (m=3, n>2) My goal is to find the best polynomial in x fitting the (x,y) dataset.

The dimension of x is pretty big (right now it is 25), and I don't want to enter manually all the possibilities (ie x1*x3*x5, x1*x4*x6, …). I can use Matlab, Mathematica and R. How can I do this?

Also, I would be interested in hearing your suggestions about the following problem: how can I choose from the result the most relevant coefficients? (maybe x1*x2 is more relevant than x2*x3)

Thank you

Best Answer

In Matlab, let Y be the $k\times 3$ matrix whose rows are observations of $y$, and similarly let X be the $k\times n$ matrix of corresponding $X$ observations. To regress Y against X, try

beta = X \ Y;

The variable beta will now be $n\times 3$ and have the regression coefficients.

This is for a model that is 'linear' in your input $X$ variables. To populate a matrix with the $n(n+1)/2$ pairs of products of the $X$ variables is a bit more tricky without using a bunch of for loops. One could do something like:

Z = bsxfun(@times,X,permute(X,[1,3,2]));  % this is k x n x n
Z = reshape(Z,size(Z,1),[]);              % this is k x (n^2)
o2n = 1:size(X,2);                        % elements 1 to n
keepidx = bsxfun(@ge,o2n,permute(o2n,[1,3,2]));  % keep only if i >= j
keepidx = keepidx(:)';
Z = Z(:,keepidx);

At the end of this, modulo my programming errors, Z should be $k \times n(n+1)/2$ and have the order 2 powers of elements of X. Then you can try

beta = [X,Z] \ Y;

Variable selection is another matter entirely. As a first pass, BIC seems to work OK for multivariate multiple regression.

Related Solutions

Solved – Recovering raw coefficients and variances from orthogonal polynomial regression

Yes, it's possible.

Let $z_1, z_2, z_3$ be the non-constant parts of the orthogonal polynomials computed from the $x_i$. (Each is a column vector.) Regressing these against the $x_i$ must give a perfect fit. You can perform this with the software even when it does not document its procedures to compute orthogonal polynomials. The regression of $z_j$ yields coefficients $\gamma_{ij}$ for which

$$z_{ij} = \gamma_{j0} + x_i\gamma_{j1} + x_i^2\gamma_{j2} + x_i^3\gamma_{j3}.$$

The result is a $4\times 4$ matrix $\Gamma$ that, upon right multiplication, converts the design matrix $X=\pmatrix{1;&x;&x^2;&x^3}$ into $$Z=\pmatrix{1;&z_1;&z_2;&z_3} = X\Gamma.\tag{1}$$

After fitting the model

$$\mathbb{E}(Y) = Z\beta$$

and obtaining estimated coefficients $\hat\beta$ (a four-element column vector), you may substitute $(1)$ to obtain

$$\hat Y = Z\hat\beta = (X\Gamma)\hat\beta = X(\Gamma\hat\beta).$$

Therefore $\Gamma\hat\beta$ is the estimated coefficient vector for the model in terms of the original (raw, un-orthogonalized) powers of $x$.

The following R code illustrates these procedures and tests them with synthetic data.

n <- 10        # Number of observations
d <- 3         # Degree
#
# Synthesize a regressor, its powers, and orthogonal polynomials thereof.
#
x <- rnorm(n)
x.p <- outer(x, 0:d, `^`); colnames(x.p) <- c("Intercept", paste0("x.", 1:d))
z <- poly(x, d)
#
# Compute the orthogonal polynomials in terms of the powers via OLS.
#
xform <- lm(cbind(1, z) ~ x.p-1)
gamma <- coef(xform)
#
# Verify the transformation: all components should be tiny, certainly
# infinitesimal compared to 1.
#
if (!all.equal(as.vector(1 + crossprod(x.p %*% gamma - cbind(1,z)) - 1), 
    rep(0, (d+1)^2)))
  warning("Transformation is inaccurate.")
#
# Fit the model with orthogonal polynomials.
#
y <- x + rnorm(n)
fit <- lm(y ~ z)
#summary(fit)
#
# As a check, fit the model with raw powers.
#
fit.p <- lm(y ~ .-1, data.frame(x.p))
#summary(fit.p)
#
# Compare the results.
#
(rbind(Computed=as.vector(gamma %*% coef(fit)), Fit=coef(fit.p)))

if (!all.equal(as.vector(gamma %*% coef(fit)), as.vector(coef(fit.p))))
  warning("Results were not the same.")

Solved – PCA using princomp in MATLAB

Answers to your questions:

I usually use eig(cov(data)) to get the sample eigenvectors but it is matter of personal taste. Probably princomp is better as it gives you the eigenvectors and the projections in one step. If you know $k$ and you want exactly $k$ components it could be easier to use eigs(cov(data),k). Saves you the hassle of computing the higher order components as it returns only the $k$ largest eigenvalues and eigenvectors of the covariance matrix.
Yes, you are mostly using PCA as a dimensionality reduction step. (See next point about accuracy)
Automatic determination of PCA dimensionality is really big subject. Check Minka's work. A nice and quite easily read survey is given by Cangelosi and Goriely here. In short there is no way to have the same data fidelity with your projected data as you would with the original feature vector as by definition you will exclude some modes of variation when ($k < D$). Given that some modes might be just noise though, your classification accuracy might be better; nevertheless in terms of dataset reconstruction you just need to decide of an amount (ie. percentage) of variation you want to retain, you can easily calculate that by the eigenvalue ratios $\frac{\lambda_i}{\Sigma_{i=1}^{k} \lambda_i}$.
I haven't done that before. In general your big set of samples is not a problem; the dimensionality of them is. You can calculate the eigenvectors of 100 million 10-dimensional sample in just seconds. The covariance of a 10 element sample of 10^8 readings/dimensionality each will take you a lot longer. There is work on Online/sequential estimation of PCA according to wikipedia though. Warmuth and Kuzmin's "Randomized online PCA algorithms with regret bounds that are logarithmic in the dimension" seems to investigate just that. (Probably I should also read this paper now that I think of it)

For your final GPU-related question: While I don't work with GPUs extensively myself (and even more I have never used MATLAB for that), I know if you have multiple I/O procedure to the GPU that is a severe computational bottleneck. As such the speed-up you get by the massively parellel computational capabilities of a GPU is nullified by the I/O overhead if you have a small dataset. You'll almost surely get more insightful answers about that point in StackOverflow.

Best Answer

Related Solutions

Solved – Recovering raw coefficients and variances from orthogonal polynomial regression

Solved – PCA using princomp in MATLAB

Related Question