What happens when all PCs are used?
If all PCs are used, then the resulting regression coefficients will be identical to the ones obtained with OLS regression, and so this procedure should better not be called "principal component regression". It is standard regression, only performed in a roundabout way.
You are asking how it is possible that nothing is gained, given that after PCA the predictors become orthogonal. The devil hides in the back-transformation of the regression coefficients from the PCA space to the original space. What you need to know is that the variance of the estimated regression coefficients inversely depends on the covariance matrix of the predictors. The PCA-transformed predictors, let's call them $Z$, have diagonal covariance matrix (because they are uncorrelated). So all regression coefficients for $Z$ are also uncorrelated; the ones corresponding to the high-variance PCs have low variance (i.e. are estimated reliably) and the ones corresponding to the low-variance PCs have high variance (i.e. are estimated unreliably). When these coefficients are back-transformed to the original predictors $X$, each of the predictors $X_i$ will get some portion of the unreliable estimates, and so all coefficients can become unreliable.
So nothing is gained.
What happens when only few PCs are used?
When not all the PCs are retained in PCR, then the resulting solution $\hat \beta_\mathrm{PCR}$ will generally not be equal to the standard ordinary least squares solution $\hat \beta_\mathrm{OLS}$. It is a standard result that OLS solution is unbiased: see Gauss-Markov theorem. "Unbiased" means that $\hat \beta$ is correct on average, even though it can be very noisy. Since PCR solution differs from it, it will be biased, meaning that it will be incorrect on average. However, it often happens that it is substantially less noisy, leading to the overall more accurate predictions.
This is an example of the bias-variance trade-off. See Why does shrinkage work? for some further general discussion.
In the comments, @whuber pointed out that the PCR solution does not have to differ from the OLS one and hence does not have to be biased. Indeed, if the dependent variable $y$ is uncorrelated (in population, not in sample) with all the low-variance PCs that are not included in the PCR model, then dropping these PCs will not influence the unbiasedness. This, however, is unlikely to be the case in practice: PCA is conducted without taking $y$ into account so it stands to reason that $y$ will tend to be somewhat correlated with all the PCs.
Why using high-variance PCs is a good idea at all?
This was not part of the question, but you might be interested in the following thread for the further reading: How can top principal components retain the predictive power on a dependent variable (or even lead to better predictions)?
Removing a dimension from a data cloud, such as removing 1st PC of it, amounts to projecting data points onto the (hyper)plane perpendicular to the axis of that dimension. Imagine as example that your data is spheroid in 3D space. The PC1 is the spheroids main axis. Removing it is the projecting onto the plane which that axis pierces at 90 degree angle. Then, you are left with spherical data cloud lying in that plane.
Best Answer
Imagine data points filling a 2D rectangle in the center of the coordinate system, with its sides oriented along the coordinate axes: from $-a$ to $a$ along the $x$-axis, and from $-b$ to $b$ along the $y$-axis.
The projection on $x$ is a uniform distribution with variance $a^2/3$. The projection on $y$ is also a uniform distribution with variance $b^2/3$. Since $x$ and $y$ are obviously not correlated (if this is not obvious, ask yourself whether the correlation should be positive or negative?.. due to symmetry it can only be zero), the covariance between them is zero. This yields the covariance matrix $$\left(\begin{array}{c}a^2/3&0\\0&b^2/3\end{array}\right).$$ The task of PCA is to diagonalize the covariance matrix. But this one is already diagonal! This means that no rotation is necessary, and $x$-axis and $y$-axis are themselves principal axes. If e.g. $a>b$, then the $x$-axis is the first PC.
This might be a bit counter-intuitive: it might seem that a projection on the diagonal should have larger variance than the projection on the longer side; but it is in fact not so.
Bonus: Dzhanibekov effect
You seem to have meant a 3D rectangular parallelepiped instead of 2D rectangle. The arguments of course stay the same: covariance matrix is $3\times 3$ but still diagonal with principal axes being the coordinate axes.
Incidentally, there is a curious effect in mechanics concerning rotating solid body with three different moments of inertia (which is a mechanics analog of variance). It turns out that rotations around the axes with the largest and the smallest moment of inertia are stable, but rotation around the axis with the middle moment of inertia is unstable. Moreover, a rotating body will experience sudden "flips", which is known as Dzhanibekov effect -- after a Russian cosmonaut who observed it in space. One can easily observe it when spinning a book or a table tennis racket. See the following great threads on mathoverflow and on physics.SE and these videos: