In short, yes this is very possible, and probably even the generic situation.
PCA components have the special feature that they are mutually uncorrelated. If we call your PCA components $x_a$, then
$$cov(x_a, x_b) = 0 ~~~ \text{for} ~ a \neq b$$
But they can still be correlated with some other variable, i.e. you can have
$$cov(x_a, y) \neq 0 ~~~ \text{and} ~~~ cov(x_b, y) $$
at the same time.
One can prove this mathematically, but the geometric interpretation of the sample (co)variance helps seeing this intuitively here:
Say we label your datapoints by the index $i = 1 , \cdots , N$, then we can think of each principal component as a vector $ \vec{x_a} $ by simply stacking the observed values together
$$\vec{x_a} = (x_{a,1}, x_{a,2}, \cdots, x_{a,N} )^T $$
Now, looking at the definition of variance
$$ var(x_a) = \mathbb{E}[x_a^2] = \frac{1}{N} \sum_{i=1}^N x_{a,i}^2 $$
where we used w.l.o.g that the PCAs are centered, i.e. $\mathbb{E}[x_a] = 0$. Now you can see that the variance of $x_a$ is simply the Euclidean norm, aka length, of the vector $\vec{x_a}$
$$ var(x_a) = \frac{1}{N} ||\vec{x_a}||^2 $$
Similarly, looking at the covariance
$$ cov(x_a,x_b) = \mathbb{E}[x_a x_b]= \frac{1}{N} \sum_{i=1}^N x_{a,i}x_{b,i} $$
we recognise that as the dot product of $\vec{x_a}$ with $\vec{x_b}$
$$ cov(x_a,x_b) = \frac{1}{N} \, \vec{x_a} \cdot \vec{x_b} = \frac{1}{N} \, |\vec{x_a}| \, |\vec{x_b}| \, \cos(\phi_{ab}) $$
and $ cov(x_a,y) = \frac{1}{N} \, \vec{x_a} \cdot \vec{y}$.
Having this geometric interpretation of variance and covariance allows us to re-state the question in a way that makes the answer obvious:
Can you have two orthogonal vectors, $\vec{x_a} $ and $\vec{x_b} $, which each have overlap with some other vector $\vec{y} $?
The answer to that is clearly yes... a simple example of that is the 2-d plane: say $\vec{x_a} $ points along the horizontal axis and $\vec{x_b} $ points along the vertical axis, then
- they are mutually orthogonal and
- any other generic vector in the 2-d plane will have some non-zero overlap with both of them
This also answers your second question: no it's not useless to do a fit with multiple principal components and one response variable :)
Best Answer
You don't choose a subset of your original 99 (100-1) variables.
Each of the principal components are linear combinations of all 99 predictor variables (x-variables, IVs, ...). If you use the first 40 principal components, each of them is a function of all 99 original predictor-variables. (At least with ordinary PCA - there are sparse/regularized versions such as the SPCA of Zou, Hastie and Tibshirani that will yield components based on fewer variables.)
Consider the simple case of two positively correlated variables, which for simplicity we will assume are equally variable. Then the first principal component will be a (fractional) multiple of the sum of both variates and the second will be a (fractional) multiple of the difference of the two variates; if the two are not equally variable, the first principal component will weight the more-variable one more heavily, but it will still involve both.
So you start with your 99 x-variables, from which you compute your 40 principal components by applying the corresponding weights on each of the original variables. [NB in my discussion I assume $y$ and the $X$'s are already centered.]
You then use your 40 new variables as if they were predictors in their own right, just as you would with any multiple regression problem. (In practice, there's more efficient ways of getting the estimates, but let's leave the computational aspects aside and just deal with a basic idea)
In respect of your second question, it's not clear what you mean by "reversing of the PCA".
Your PCs are linear combinations of the original variates. Let's say your original variates are in $X$, and you compute $Z=XW$ (where $X$ is $n\times 99$ and $W$ is the $99\times 40$ matrix which contains the principal component weights for the $40$ components you're using), then you estimate $\hat{y}=Z\hat{\beta}_\text{PC}$ via regression.
Then you can write $\hat{y}=Z\hat{\beta}_\text{PC}=XW\hat{\beta}_\text{PC}=X\hat{\beta}^*$ say (where $\hat{\beta}^*=W\hat{\beta}_\text{PC}$, obviously), so you can write it as a function of the original predictors; I don't know if that's what you meant by 'reversing', but it's a meaningful way to look at the original relationship between $y$ and $X$. It's not the same as the coefficients you get by estimating a regression on the original X's of course -- it's regularized by doing the PCA; even though you'd get coefficients for each of your original X's this way, they only have the d.f. of the number of components you fitted.
Also see Wikipedia on principal component regression.