Back in the late 1990s, I did my dissertation on collinearity.
My conclusion was that condition indexes were best.
The main reason was that, rather than look at individual variables, it lets you look at sets of variables. Since collinearity is a function of sets of variables, this is a good thing.
Also, the results of my Monte Carlo study showed better sensitivity to problematic collinearity, but I have long ago forgotten the details.
On the other hand, it is probably the hardest to explain. Lots of people know what $R^2$ is. Only a small subset of those people have heard of eigenvalues. However, when I have used condition indexes as a diagnostic tool, I have never been asked for an explanation.
For much more on this, check out books by David Belsley. Or, if you really want to, you can get my dissertation Multicollinearity diagnostics for multiple regression: A Monte Carlo study
When variables are co-linear, you can think of them sometimes as being different manifestations of the same thing. Say I had a dataset of happiness of cats, a variable of whether they were soaking wet, and a variable of whether or not there were nearby children who thought it was fun to throw cats into water. Clearly cats don't like water, yet sometimes they will fall into it on their own. More often however, they are thrown in by malevolent children. Sometimes however, malevolent children fail to thrown cats in the water.
So, wet cats
and malevolent children
are different, but can be thought of as a unitary dynamic. If a researcher was only interested in the effect of wetness on cat happiness, and didn't control for malevolent children, the estimates would be biased. Include them, and VIF goes up. This is because you simply don't have enough independent observations of wetness to know its effect apart from the effect of malevolent children.
Shrinkage estimators are one way to go. Basically, you increase the bias of your estimator in order to decrease its variance. Appealing for prediction, but not for inference.
If you're willing to put aside (or think differently about) inference on individual model terms, you could first do a principal components analysis, "interpret" your principal components somehow, and then fit your regression to the rotated dataset. Collinearity will be gone, but you're only able to conduct inference on these PC's, which may or may not have a convenient interpretation. In the case of wet cats
and malevolent children
, the first PC would increase as the probability of wetness got higher and as the probability of malevolent children increased. The other PC would be perpendicular, and relate to wetness as the probability of malevolent children decreased. If you simply wanted to know the effect of wetness absent malevolent children, you'd be interested in the coefficient on the second PC. Most PC regs don't have interpretation this straightforward however.
It is also worth emphasizing that prediction from a model with high collinearity is fine. So if your F-stat is good and you don't care about any of the coefficients individually, leave the model as it is.
Best Answer
Multicollinearity problem is well studied in actually most econometric textbooks. Moreover there is a good article in wikipedia which actually summarizes most of the key issues.
In practice one starts to bear in mind the multicollinearity problem if it causes some visual signs of parameter instability (most of them are implied by non (poor) invertability of $X^TX$ matrix):
probably not theoretically, since it may happen (and usually is the case) that you need all variables to be present in the model. Excluding relevant variables (omitted variable problem) will make biased and inconsistent parameter estimates anyway. On the other hand you may be forced to include all focus variables simply because your analysis is based on it. In data-mining approach though you are more technical in searching for the best fit.
So keep in mind the alternatives (that I would use myself):
Some other tricks are in the wiki article noted above.