What happens when all PCs are used?
If all PCs are used, then the resulting regression coefficients will be identical to the ones obtained with OLS regression, and so this procedure should better not be called "principal component regression". It is standard regression, only performed in a roundabout way.
You are asking how it is possible that nothing is gained, given that after PCA the predictors become orthogonal. The devil hides in the back-transformation of the regression coefficients from the PCA space to the original space. What you need to know is that the variance of the estimated regression coefficients inversely depends on the covariance matrix of the predictors. The PCA-transformed predictors, let's call them $Z$, have diagonal covariance matrix (because they are uncorrelated). So all regression coefficients for $Z$ are also uncorrelated; the ones corresponding to the high-variance PCs have low variance (i.e. are estimated reliably) and the ones corresponding to the low-variance PCs have high variance (i.e. are estimated unreliably). When these coefficients are back-transformed to the original predictors $X$, each of the predictors $X_i$ will get some portion of the unreliable estimates, and so all coefficients can become unreliable.
So nothing is gained.
What happens when only few PCs are used?
When not all the PCs are retained in PCR, then the resulting solution $\hat \beta_\mathrm{PCR}$ will generally not be equal to the standard ordinary least squares solution $\hat \beta_\mathrm{OLS}$. It is a standard result that OLS solution is unbiased: see Gauss-Markov theorem. "Unbiased" means that $\hat \beta$ is correct on average, even though it can be very noisy. Since PCR solution differs from it, it will be biased, meaning that it will be incorrect on average. However, it often happens that it is substantially less noisy, leading to the overall more accurate predictions.
This is an example of the bias-variance trade-off. See Why does shrinkage work? for some further general discussion.
In the comments, @whuber pointed out that the PCR solution does not have to differ from the OLS one and hence does not have to be biased. Indeed, if the dependent variable $y$ is uncorrelated (in population, not in sample) with all the low-variance PCs that are not included in the PCR model, then dropping these PCs will not influence the unbiasedness. This, however, is unlikely to be the case in practice: PCA is conducted without taking $y$ into account so it stands to reason that $y$ will tend to be somewhat correlated with all the PCs.
Why using high-variance PCs is a good idea at all?
This was not part of the question, but you might be interested in the following thread for the further reading: How can top principal components retain the predictive power on a dependent variable (or even lead to better predictions)?
Best Answer
Regression based on principal components analysis (PCA) of the independent variables is certainly a way to approach this problem; see this Cross Validated page for one extensive discussion of pros and cons, with links to further related topics. I don't see the point of the regression you propose after choosing the largest components. The "reconstructed" independent variables might suffer from being too highly dependent on the particular sample on which you based the model, and stepwise selection is generally not a good idea. Cross-validation would be a better way to choose the number of components to retain, finding the number of components that minimizes cross-validation error.
In your situation, with only 5 predictors you might be just as well served by a standard linear model. Unless you have extremely high correlations among some of your variables, you are unlikely to have the numerical instability issues that can arise in extreme cases. (And if you do have two very highly correlated predictors, you should consider using your knowledge of the subject matter rather than an automated approach to choose one.) Paying attention to model diagnostics will help determine whether the linear model is reasonable.
A standard regression model provides easier-to-interpret coefficients and might be easier to explain to others than PCA. For predictions from a linear model you should consider including all 5 independent variables (even those that aren't "statistically significant"), both because of the limitations of stepwise selection and because the relations of some predictors to the dependent variable will differ if other predictors are removed.
If you have very high co-linearity in a standard linear regression then it should show up in high errors associated with the corresponding coefficients, and you might consider approaches noted here like ridge regression to get useful information from all your predictors without overfitting. Ridge regression can be considered as a continuous version of the PCA-regression approach, where principal components are differentially weighted instead of being completely either in or out of the final model; see section 3.5 of Elements of Statistical Learning.
For your second and third questions:
The first page I linked above does a pretty good job of addressing your second question. Yes, choosing a limited number of principal components can help in reducing the problems associated with co-linearity, as the co-linear variables will tend to enter the same principal components together. Two warnings: the predictors should be standardized so that differences in scales don't drive the construction of the principal components, and there's no assurance that the components that capture the greatest variation in the predictors will be those most closely related to the dependent variable.
With respect to your third question, a stepwise approach is inappropriate, as you recognize. I don't see a reason why you couldn't include interaction terms among your selected principal components in a regression, but they would be extremely hard to interpret. That's another reason why I would lean here toward working with the original independent variables rather than with their transformations into principal components.
You seem very interested in using PCA for this predictive model, but remember that it's easy to get fixated on a particular approach. You are in a very good position to compare several approaches, combined with appropriate cross-validation or bootstrapping techniques, to see which works best for your particular needs. If that ends up being PCA that's good, but don't dismiss the other possibilities out of hand.