I cannot reproduce exactly this phenomenon, but I can demonstrate that VIF does not necessarily increase as the number of categories increases.
The intuition is simple: categorical variables can be made orthogonal by suitable experimental designs. Therefore, there should in general be no relationship between numbers of categories and multicollinearity.
Here is an R
function to create categorical datasets with specifiable numbers of categories (for two independent variables) and specifiable amount of replication for each category. It represents a balanced study in which every combination of category is observed an equal number of times, $n$:
trial <- function(n, k1=2, k2=2) {
df <- expand.grid(1:k1, 1:k2)
df <- do.call(rbind, lapply(1:n, function(i) df))
df$y <- rnorm(k1*k2*n)
fit <- lm(y ~ Var1+Var2, data=df)
vif(fit)
}
Applying it, I find the VIFs are always at their lowest possible values, $1$, reflecting the balancing (which translates to orthogonal columns in the design matrix). Some examples:
sapply(1:5, trial) # Two binary categories, 1-5 replicates per combination
sapply(1:5, function(i) trial(i, 10, 3)) # 30 categories, 1-5 replicates
This suggests the multicollinearity may be growing due to a growing imbalance in the design. To test this, insert the line
df <- subset(df, subset=(y < 0))
before the fit
line in trial
. This removes half the data at random. Re-running
sapply(1:5, function(i) trial(i, 10, 3))
shows that the VIFs are no longer equal to $1$ (but they remain close to it, randomly). They still do not increase with more categories: sapply(1:5, function(i) trial(i, 10, 10))
produces comparable values.
Best Answer
Belsley, Kuh, and Welsch is the text to go to for this kind of question. They include extensive discussion of older diagnostics in a section entitled "Historical Perspective". Concerning VIF they write
In place of analyzing $R$ (or $R^{-1}$), BKW propose careful, controlled examination of the Singular Value Decomposition of $X$. They motivate it by demonstrating that the ratio of the largest to the smallest singular values is the condition number of $X$ and show how the condition number provides (at times tight) bounds on the the propagation of computing errors in the calculation of the regression estimates. They go on to attempt an approximate decomposition of the variances of the parameter estimates $\hat\beta_i$ into components associated with the singular values. The power of this decomposition lies in its ability (in many cases) to reveal the nature of the collinearity, rather than just indicating its presence.
Anyone who has built regression models with hundreds of variables will appreciate this feature! It is one thing for the software to say "your data are collinear, I cannot proceed" or even to say "your data are collinear, I'm throwing out the following variables." It is altogether a much more useful thing for it to be able to say "the group of variables $X_{i_1}, \ldots, X_{i_k}$ is causing instabilities in the calculations: see which of those variables you can do without or consider performing a principal components analysis to reduce their number."
Ultimately, BKW recommend diagnosing collinearity by means of