Yes, it's possible.
Let $z_1, z_2, z_3$ be the non-constant parts of the orthogonal polynomials computed from the $x_i$. (Each is a column vector.) Regressing these against the $x_i$ must give a perfect fit. You can perform this with the software even when it does not document its procedures to compute orthogonal polynomials. The regression of $z_j$ yields coefficients $\gamma_{ij}$ for which
$$z_{ij} = \gamma_{j0} + x_i\gamma_{j1} + x_i^2\gamma_{j2} + x_i^3\gamma_{j3}.$$
The result is a $4\times 4$ matrix $\Gamma$ that, upon right multiplication, converts the design matrix $X=\pmatrix{1;&x;&x^2;&x^3}$ into $$Z=\pmatrix{1;&z_1;&z_2;&z_3} = X\Gamma.\tag{1}$$
After fitting the model
$$\mathbb{E}(Y) = Z\beta$$
and obtaining estimated coefficients $\hat\beta$ (a four-element column vector), you may substitute $(1)$ to obtain
$$\hat Y = Z\hat\beta = (X\Gamma)\hat\beta = X(\Gamma\hat\beta).$$
Therefore $\Gamma\hat\beta$ is the estimated coefficient vector for the model in terms of the original (raw, un-orthogonalized) powers of $x$.
The following R
code illustrates these procedures and tests them with synthetic data.
n <- 10 # Number of observations
d <- 3 # Degree
#
# Synthesize a regressor, its powers, and orthogonal polynomials thereof.
#
x <- rnorm(n)
x.p <- outer(x, 0:d, `^`); colnames(x.p) <- c("Intercept", paste0("x.", 1:d))
z <- poly(x, d)
#
# Compute the orthogonal polynomials in terms of the powers via OLS.
#
xform <- lm(cbind(1, z) ~ x.p-1)
gamma <- coef(xform)
#
# Verify the transformation: all components should be tiny, certainly
# infinitesimal compared to 1.
#
if (!all.equal(as.vector(1 + crossprod(x.p %*% gamma - cbind(1,z)) - 1),
rep(0, (d+1)^2)))
warning("Transformation is inaccurate.")
#
# Fit the model with orthogonal polynomials.
#
y <- x + rnorm(n)
fit <- lm(y ~ z)
#summary(fit)
#
# As a check, fit the model with raw powers.
#
fit.p <- lm(y ~ .-1, data.frame(x.p))
#summary(fit.p)
#
# Compare the results.
#
(rbind(Computed=as.vector(gamma %*% coef(fit)), Fit=coef(fit.p)))
if (!all.equal(as.vector(gamma %*% coef(fit)), as.vector(coef(fit.p))))
warning("Results were not the same.")
Answers to your questions:
- I usually use
eig(cov(data))
to get the sample eigenvectors but it is matter of personal taste. Probably princomp
is better as it gives you the eigenvectors and the projections in one step. If you know $k$ and you want exactly $k$ components it could be easier to use eigs(cov(data),k)
. Saves you the hassle of computing the higher order components as it returns only the $k$ largest eigenvalues and eigenvectors of the covariance matrix.
- Yes, you are mostly using PCA as a dimensionality reduction step. (See next point about accuracy)
- Automatic determination of PCA dimensionality is really big subject. Check Minka's work. A nice and quite easily read survey is given by Cangelosi and Goriely here. In short there is no way to have the same data fidelity with your projected data as you would with the original feature vector as by definition you will exclude some modes of variation when ($k < D$). Given that some modes might be just noise though, your classification accuracy might be better; nevertheless in terms of dataset reconstruction you just need to decide of an amount (ie. percentage) of variation you want to retain, you can easily calculate that by the eigenvalue ratios $\frac{\lambda_i}{\Sigma_{i=1}^{k} \lambda_i}$.
- I haven't done that before. In general your big set of samples is not a problem; the dimensionality of them is. You can calculate the eigenvectors of 100 million 10-dimensional sample in just seconds. The covariance of a 10 element sample of 10^8 readings/dimensionality each will take you a lot longer. There is work on Online/sequential estimation of PCA according to wikipedia though. Warmuth and Kuzmin's "Randomized online PCA algorithms with regret bounds that are logarithmic in the dimension" seems to investigate just that. (Probably I should also read this paper now that I think of it)
For your final GPU-related question: While I don't work with GPUs extensively myself (and even more I have never used MATLAB for that), I know if you have multiple I/O procedure to the GPU that is a severe computational bottleneck. As such the speed-up you get by the massively parellel computational capabilities of a GPU is nullified by the I/O overhead if you have a small dataset. You'll almost surely get more insightful answers about that point in StackOverflow.
Best Answer
In Matlab, let
Y
be the $k\times 3$ matrix whose rows are observations of $y$, and similarly letX
be the $k\times n$ matrix of corresponding $X$ observations. To regressY
againstX
, trybeta = X \ Y;
The variable
beta
will now be $n\times 3$ and have the regression coefficients.This is for a model that is 'linear' in your input $X$ variables. To populate a matrix with the $n(n+1)/2$ pairs of products of the $X$ variables is a bit more tricky without using a bunch of for loops. One could do something like:
At the end of this, modulo my programming errors,
Z
should be $k \times n(n+1)/2$ and have the order 2 powers of elements ofX
. Then you can trybeta = [X,Z] \ Y;
Variable selection is another matter entirely. As a first pass, BIC seems to work OK for multivariate multiple regression.