There are two points here:
The passage recommends transforming IVs to linearity only when there is evidence of nonlinearity. Nonlinear relationships among IVs can also cause collinearity and, more centrally, may complicate other relationships. I am not sure I agree with the advice in the book, but it's not silly.
Certainly very strong linear relationships can be causes of collinearity, but high correlations are neither necessary nor sufficient to cause problematic collinearity. A good method of diagnosing collinearity is the condition index.
EDIT in response to comment
Condition indexes are described briefly here as "square root of the maximum eigenvalue divided by the minimum eigenvalue". There are quite a few posts here on CV that discuss them and their merits. The seminal texts on them are two books by David Belsley: Conditioning diagnostics and Regression Diagnostics (which has a new edition, 2005, as well).
The skewness of the outcome variable (treated unconditionally on the other variables) will depend on the arrangement of the independent variables -- it might validly be anything. You shouldn't be trying to make the distribution of the outcome look like any particular thing. It's the error term the normal assumption is needed for.
Normality of residuals probably isn't all that important compared to the other assumptions (unless you're after prediction intervals) -- you will want to focus more on getting the models for the mean and variance right.
That said, if a log-transform produces slightly left skew residuals, you might possibly do better with a Gamma GLM (the log of a gamma random variable is left skew, the degree of skewness depends on the gamma's shape parameter). Aside from that, the Gamma model with a log link has a lot of similarities to a linear model in the logs. This also has the advantage of readily dealing with other nonlinear relationships between the conditional mean of the outcome and the linear predictor (linear combination of the independent variables) by choice of a different link function.
(If such a GLM is suitable - and again, the model for the mean and variance matters more than the distributional assumption, it implies heteroskedasticity in your data; if there's no evidence of this you may not be better off than with linear regression)
And if I were to use the model based on the transformed data, how would I properly interpret the output?
If you assume approximate normality of the logs, it implies that your linear, additive-error model on the log-scale is a multiplicative lognormal model on the original scale.
I find it easier to interpret natural logs rather than base 10 logs (not least, I have a lot more practice at it), but since one is simply a scaled version of the other, most of the intuition carries across.
One the log scale, a unit change in one of your independent variables, $x_j$ produces an additive change of the corresponding coefficient in the outcome of $\beta_j$. On the original scale, a unit change in the independent variable multiplies the typical outcome (e.g. in the mean, or the median - the effect on either is the same) by $10^\beta_j$.
Beware: if you want to make statements about the (conditional) mean of the outcome (rather than changes in it, as discussed in the previous paragraph), you don't just take $10^\text{mean on the log scale}$. If you need to do this I can provide more details about the calculation under the normal assumption. (This is not an issue for the GLM approach, since it models the mean directly rather than via a transform)
However, prediction intervals, for example, transform back just fine.
Best Answer
Okay, a few things.
1) I always advise against using tests for normality. They answer a question you already know the answer to, i.e. "Is your data normal?" (The answer is no because nothing is normal) vs the question "Is the lack of normality going to be a problem?" which is the question you should be interested in.
2) The assumption of normality is not so much about the predictive performance, but rather the correctness of the inference you would perform (hypothesis tests and confidence intervals).
3) Some deviation from normality is okay, because we have asymptotics that drive test statistics to normality.
4) You QQ-plot does not appear to be severely not normal (although there might be some bimodality in your residuals. You may want to check if there is an omitted variable or something). As another commenter stated, the normality is the one that can kind of fail (can have mild - moderate deviations from it).
5) So to answer your question
(i) Yes, you do the log transform (or some other transformation) first.
(ii) Once you transform your variable the nonnormality EDIT may be worth looking to see why the residuals seem to be in two distinct clusters.