If you only have two variables, you can just check the correlation between them. The VIF is:
$$
\text{VIF}=\frac{1}{\text{tolerance}}=\frac{1}{1-r^2}
$$
On the other hand, kappa, is the condition number; that is:
$$
\sqrt{\frac{\text{max(eigenvalue(X'X))}}{\text{min(eigenvalue(X'X))}}}
$$
One thing that is often recommended with kappa is to center your variables first (note that there is difference of opinion about this recommendation). If your variables are far from 0, the sampling distributions of the $\beta_j$s will be correlated with the sampling distribution of $\beta_0$ (i.e., the intercept). I suspect that's what you are seeing here.
It might help you to read my question here: Is there a reason to prefer a specific measure of multicollinearity?
The VIF is probably the best way to go here. The Pearson correlation will give you a lousy measure here because it behaves somewhat weirdly for categorical variables like this. Another possibility is to use a matrix of a different measure like cosine similarity: $\sum x_i*x_j / \sqrt{\sum x_i^2 * \sum x_j^2}$. I think that is equivalent to Spearman's Rho or Kendall's Tau but am not sure off the top of my head.
I'd stick to the VIF though because it will tell you for each variable whether the other variables combined are highly colinear. But if you want a visual diagnostic of which pairwise variables are similar, those other metrics are better than Pearson for categorical data.
----EDIT---
Sure. This has to do primarily with the fact that Pearson's correlation can swing up or down or go negative very easily. Here's an example:
> cor(c(0,1,1,1,0,1,0,1,0),c(1,1,0,1,1,0,1,1,0))
[1] -0.1581139
> cor(c(0,1,1,1,0,1,0,1,0),c(0,1,0,1,1,0,1,1,0))
[1] 0.1
Here, by changing just one of the entries to zero we have swung the correlation from positive to negative. But the VIF uses $1/(1-R_{i}^2)$ where the $R_{i}^2$ is for the regression of the other variables on the one in question. I would have to work it out but I think that is basically a linear combination of something similar to the cosine measure I posted above, or a transform of it. Essentially though, it can't go negative.
I don't know any literature on it off the top of my head, but I will think about it.
Best Answer
VIF and stepwise regression are two different beasts. Stepwise regression is an exercise in model-building, whereas computing VIF is a diagnostic tool done post-estimation to check for multicollinearity. Therefore, there is no answer to the second part of your question ("What variables are different while running both techniques?"), because VIF is not a model-building technique.
With stepwise regression, you are either adding (forward) or deleting (backward) variables from the model and seeing how estimates change. Typically, variables are "kicked out" of the model if the p-values do not cross a certain threshold pre-set by the researcher (e.g. if $p>0.10$).
VIF is done when you already have a model to work with. Calculation of VIF is fairly straightforward. Given the model:
You can calculate the VIF of each parameter estimate $i$ (e.g. $\hat\beta_1$,$\hat\beta_2$, $...$ ,$\hat\beta_i$) using the formula $VIF_i = 1/(1-R_i^2)$ where $R_i^2$ is the $R^2$ from a model predicting $X_i$ using all other covariates as predictors, e.g.,