You were very very nearly there and got caught by a subtle issue in working with matrices in R. I worked through from your final_data
and got the correct results independently. Then I had a closer look at your code. To cut a long story short, where you wrote
row_orig_data = t(t(feat_vec[1,]) %*% t(trans_data))
you would have been ok if you had written
row_orig_data = t(t(feat_vec) %*% t(trans_data))
instead (because you'd zeroed out the part of trans_data
that was projected on the second eigenvector). As it was you were trying to multiply a $2\times1$ matrix by a $2\times10$ matrix but R did not give you an error. The problem is that t(feat_vec[1,])
is treated as $1\times2$. Trying row_orig_data = t(as.matrix(feat_vec[1,],ncol=1,nrow=2) %*% t(trans_data))
would have given you a non-conformable arguments
error. The following, possibly more along the lines of what you intended, would also have worked
row_orig_data = t(as.matrix(feat_vec[1,],ncol=1,nrow=2) %*% t(trans_data)[1,])
as it multiplies a $2\times1$ matrix by a $1\times10$ matrix (note that you could have used the original final_data
matrix here). It isn't necessary to do it this way, but it's nicer mathematically because it shows that you are getting $20=2\times10$ values in row_orig_data
from $12=2\times1 + 1\times10$ values on the right hand side.
I've left my original answer below, as someone might find it useful, and it does demonstrate getting the required plots. It also shows that the code can be a bit simpler by getting rid of some unnecessary transposes: $(XY)^T=Y^TX^T$ so t(t(p) %*% t(q)) = q %*% t
.
Re your edit, I've added the principal component line in green to my plot below. In your question you got had the slope as $x/y$ not $y/x$.
Write
d_in_new_basis = as.matrix(final_data)
then to get your data back in its original basis you need
d_in_original_basis = d_in_new_basis %*% feat_vec
You can zero out the parts of your data that are projected along the second component using
d_in_new_basis_approx = d_in_new_basis
d_in_new_basis_approx[,2] = 0
and you can then transform as before
d_in_original_basis_approx = d_in_new_basis_approx %*% feat_vec
Plotting these on the same plot, together with the principal component line in green, shows you how the approximation worked.
plot(x=d_in_original_basis[,1]+mean(d$x),
y=d_in_original_basis[,2]+mean(d$y),
pch=16, xlab="x", ylab="y", xlim=c(0,3.5),ylim=c(0,3.5),
main="black=original data\nred=original data restored using only a single eigenvector")
points(x=d_in_original_basis_approx[,1]+mean(d$x),
y=d_in_original_basis_approx[,2]+mean(d$y),
pch=16,col="red")
points(x=c(mean(d$x)-e$vectors[1,1]*10,mean(d$x)+e$vectors[1,1]*10), c(y=mean(d$y)-e$vectors[2,1]*10,mean(d$y)+e$vectors[2,1]*10), type="l",col="green")
Let's rewind to what you had. This line was ok
final_data = data.frame(t(feat_vec %*% row_data_adj))
The crucial bit here is feat_vec %*% row_data_adj
which is equivalent to $Y=S^TX$ where $S$ is the matrix of eigenvectors and $X$ is your data matrix with your data in rows, and $Y$ is the data in the new basis. What this is saying is that the first row of $Y$ is the sum of (rows of $X$ weighted by the first eigenvector). And the second row of $Y$ is the sum of (rows of $X$ weighted by the second eigenvector).
Then you had
trans_data = final_data
trans_data[,2] = 0
This is ok: you're just zeroing out the parts of your data that are projected along the second component. Where it goes wrong is
row_orig_data = t(t(feat_vec[1,]) %*% t(trans_data))
Writing $\hat Y$ for the matrix of data $Y$ in the new basis, with zeros in the second row, and writing $\mathbf{e}_1$ for the first eigenvector, the business end of this code t(feat_vec[1,]) %*% t(trans_data)
comes down to $\mathbf{e}_1 \hat Y$.
As explained above (this is where I realised the subtle R problem and wrote the first part of my answer), mathematically you are trying to multiply a $2\times1$ vector by a $2\times10$ matrix. This doesn't work mathematically. What you should do is take the first row of $\hat Y$ = the first row of $Y$: call this $\mathbf{y}_1$. Then multiply $\mathbf{e}_1$ and $\mathbf{y}_1$ together. The $i$th column of the result $\mathbf{e}_1\mathbf{y}_1$ is the eigenvector $\mathbf{e}_1$ weighted by the 1st coordinate only of the $i$th point in the new basis, which is what you want.
Best Answer
Pages 13-20 of the tutorial you posted provide a very intuitive geometric explanation of how PCA is used for dimensionality reduction.
The 13x13 matrix you mention is probably the "loading" or "rotation" matrix (I'm guessing your original data had 13 variables?) which can be interpreted in one of two (equivalent) ways:
The (absolute values of the) columns of your loading matrix describe how much each variable proportionally "contributes" to each component.
The rotation matrix rotates your data onto the basis defined by your rotation matrix. So if you have 2-D data and multiply your data by your rotation matrix, your new X-axis will be the first principal component and the new Y-axis will be the second principal component.
EDIT: This question gets asked a lot, so I'm just going to lay out a detailed visual explanation of what is going on when we use PCA for dimensionality reduction.
Consider a sample of 50 points generated from y=x + noise. The first principal component will lie along the line y=x and the second component will lie along the line y=-x, as shown below.
The aspect ratio messes it up a little, but take my word for it that the components are orthogonal. Applying PCA will rotate our data so the components become the x and y axes:
The data before the transformation are circles, the data after are crosses. In this particular example, the data wasn't rotated so much as it was flipped across the line y=-2x, but we could have just as easily inverted the y-axis to make this truly a rotation without loss of generality as described here.
The bulk of the variance, i.e. the information in the data, is spread along the first principal component (which is represented by the x-axis after we have transformed the data). There's a little variance along the second component (now the y-axis), but we can drop this component entirely without significant loss of information. So to collapse this from two dimensions into 1, we let the projection of the data onto the first principal component completely describe our data.
We can partially recover our original data by rotating (ok, projecting) it back onto the original axes.
The dark blue points are the "recovered" data, whereas the empty points are the original data. As you can see, we have lost some of the information from the original data, specifically the variance in the direction of the second principal component. But for many purposes, this compressed description (using the projection along the first principal component) may suit our needs.
Here's the code I used to generate this example in case you want to replicate it yourself. If you reduce the variance of the noise component on the second line, the amount of data lost by the PCA transformation will decrease as well because the data will converge onto the first principal component: