The fact that additional variables make large changes in your model doesn't mean there is collinearity nor does it mean it isn't "robust" (although I guess that depends on what you mean by "robust"). Nor does a reasonable VIF and a high F mean that additional variables won't have an effect. Only completely independent variables will have no effect on the other coefficients, and variables can be a long way from independent without having high VIF.
It means that controlling for those additional variables makes a large difference in the other relationships.
You haven't said what any of these variables are, so it's hard to be specific. However, let's imagine simpler models:
Model 1: log(income) as effect of race/ethnic group
Model 2: log(income) as effect of race/ethnic group + age
The coefficients in both models will likely all be significant (if it's a reasonable sample size). But the coefficient of race/ethnicity will (I bet) drop in the 2nd model because the average age of different ethnic groups is different, and income tends to be related to age.
EDIT in response to edit in post:
Since you have country level and company level variables, you should not use OLS regression as the data are not independent. You need to account for this. One way is with mixed models (aka multilevel models). I don't know how SPSS
does this, but in SAS
it would be PROC MIXED
and in R
it would be nlme
or lme4
.
If we look at the function
library(car)
getS3method("vif", "default")
#R function (mod, ...)
#R {
#R v <- vcov(mod)
#R assign <- attr(model.matrix(mod), "assign")
#R [...]
#R terms <- labels(terms(mod))
#R n.terms <- length(terms)
#R [...]
#R R <- cov2cor(v)
#R detR <- det(R)
#R result <- matrix(0, n.terms, 3)
#R rownames(result) <- terms
#R colnames(result) <- c("GVIF", "Df", "GVIF^(1/(2*Df))")
#R for (term in 1:n.terms) {
#R subs <- which(assign == term)
#R result[term, 1] <- det(as.matrix(R[subs, subs])) * det(as.matrix(R[-subs,
#R -subs]))/detR
#R result[term, 2] <- length(subs)
#R }
#R if (all(result[, 2] == 1))
#R result <- result[, 1]
#R else result[, 3] <- result[, 1]^(1/(2 * result[, 2]))
#R result
#R }
then it calls vcov
which will differ for a glm
then lm
. In the glm
case it depends on the outcome. Thus, you get the different results. All the above is consistent with the 1992 article
Fox, J., & Monette, G. (1992). Generalized collinearity diagnostics. Journal of the American Statistical Association, 87(417), 178-183.
in the linear model case. See particularly Equation (10) and
#R result[term, 1] <- det(as.matrix(R[subs, subs])) * det(as.matrix(R[-subs,
#R -subs]))/detR
To the question
Is the variance inflation factor useful for GLM models
Then I gather that the results in the 1992 article may still hold asymptotically. However, some pen and paper is likely need to justify this claim and I am may be wrong.
Best Answer
First, if you are going with one of the usual methods of testing for collinearity, condition indexes are better than variance inflation factors. This was shown by David Belsley in his two books; I also wrote about it in my dissertation.
Second, I think that the methods used in the
perturb
package inR
are very promising. The idea there is to add small amounts of noise to the data and then see how it affects results. Not only does this make intuitive sense to me, but it allows you to test for collinearity in categorical variables and makes no assumptions about the model.