I take your question to be: how do you detect when the conditions that make transformations appropriate exist, rather than what the logical conditions are. It's always nice to bookend data analyses with exploration, especially graphical data exploration. (Various tests can be conducted, but I'll focus on graphical EDA here.)
Kernel density plots are better than histograms for an initial overview of each variable's univariate distribution. With multiple variables, a scatterplot matrix can be handy. Lowess is also always advisable at the start. This will give you a quick and dirty look at whether the relationships are approximately linear. John Fox's car package usefully combines these:
library(car)
scatterplot.matrix(data)
Be sure to have your variables as columns. If you have many variables, the individual plots can be small. Maximize the plot window and the scatterplots should be big enough to pick out the plots you want to examine individually, and then make single plots. E.g.,
windows()
plot(density(X[,3]))
rug(x[,3])
windows()
plot(x[,3], y)
lines(lowess(y~X[,3]))
After fitting a multiple regression model, you should still plot and check your data, just as with simple linear regression. QQ plots for residuals are just as necessary, and you could do a scatterplot matrix of your residuals against your predictors, following a similar procedure as before.
windows()
qq.plot(model$residuals)
windows()
scatterplot.matrix(cbind(model$residuals,X))
If anything looks suspicious, plot it individually and add abline(h=0)
, as a visual guide. If you have an interaction, you can create an X[,1]*X[,2] variable, and examine the residuals against that. Likewise, you can make a scatterplot of residuals vs. X[,3]^2, etc. Other types of plots than residuals vs. x that you like can be done similarly. Bear in mind that these are all ignoring the other x dimensions that aren't being plotted. If your data are grouped (i.e. from an experiment), you can make partial plots instead of / in addition to marginal plots.
Hope that helps.
Best Answer
If you have an exact linear relationship in your independent variables (a bit more common jargon than predictor variables) like $c=a+b$, then you cannot apply the regression purposefully. In other words, it is misspecified. A statistical software will usually come up with an error message here. Intuitively, there is no unique estimator as there is no room for variation, or in technical terms, you have an non-invertible X matrix and basically try to divide by 0.
If you have a strong correlation between lets say a and c, then this is just strong multicollinearity. You can estimate the coefficients, but you need to take into account three things:
So you may suddenly wonder that your coefficient $\beta_1$ turns out to be negative, while you expect it to be theoretically by all means positive. This may be due to the very strong correlation between wage and intelligence, as intelligence may have a greater effect on wages and $\beta_1$ is downward adjusted by this effect of intelligence on wage. So, this is correct estimated, but the interpretation is now different.
However, if you take the change in the interpretation of the coefficients into account and your data correct, strong multicollinearity gives an unbiased predictor.