Software will drop variables when they are collinear. Understanding this situation amounts to figuring more precisely what that means.
There are three independent variables involved, including the constant term. Let's represent their values as the constant (column) vector $X_1 = (1, 1, \ldots, 1)$, a vector of ones and zeros for the dummy $X_2 = (1, 1, \ldots, 1, 0, 0,\ldots, 0)$, and a third apparently arbitrary vector $X_3 = (x_1, x_2, \ldots, x_n)$. (All other valid dummy codings are linear combinations of this particular $X_1$ and $X_2$, so no generality is lost by assuming that this particular binary (0-1) encoding is used.) I have sorted the data so that all the records where the dummy is $1$ come first; suppose there are $k$ of them. (We know $k \ge 1$ and $k \lt n$, for otherwise the dummy would be constant and could not be included in any regression with a constant term.)
Collinearity of these three vectors along with the $X_2 X_3$ interaction means (by definition) that there is a nontrivial linear relation
$$0 = \alpha_1 X_1 + \alpha_2 X_2 + \alpha_3 X_3 + \alpha_4 X_2 X_3$$
The first $k$ equations in this linear combination are
$$0 = \alpha_1 + \alpha_2 + \alpha_3 x_i + \alpha_4 x_i,\quad i=1, 2, \ldots, k.$$
The remaining equations are
$$0 = \alpha_1 + \alpha_3 x_i,\quad i = k+1, \ldots, n.$$
The first group of equations informs us that all the $(\alpha_3 + \alpha_4)x_i$ are equal to the constant $-(\alpha_1+\alpha_2)$ for $1 \le i \le k$. The second group informs us that all the $\alpha_3 x_i$ are equal to the constant $-\alpha_1$ for $k \lt i \le n$. That first statement does not restrict the $x_i$ for $1 \le i \le k$ provided $\alpha_3 + \alpha_4=0$, but the second one then implies that all the $x_i$ are equal to one another for $i \gt k$. For if this were not the case, then necessarily $\alpha_3 = 0$, implying either $\alpha_4=0$ or all the $x_i$ are equal to each other for $1\le i \le k$. If $\alpha_4=0$, these would in turn imply that both $\alpha_1 + \alpha_2=0$ and $\alpha_1=0$, reducing all the $\alpha_i$ to $0$: but that was not the case (the linear relation was nontrivial).
In words, what we have deduced is that the continuous variable $X_3$ exhibits no variation among at least one of the two groups of dummy values.
To confirm this conclusion we may create three examples of such data in R
. I have chosen $k=2$ and $n=4$: there are two records for each group of dummy values. In the first case, assigning random values to $X_3$ virtually guarantees there will be variation within both groups:
> set.seed(17)
> x2 <- c(1, 1, 0, 0) # The dummy (binary) variable, sorted as in the analysis
> x3 <- rnorm(4) # The continuous independent variable
> y <- rnorm(4) # The dependent variable may have *any* values
> lm(y ~ x2*x3)
Coefficients:
(Intercept) x2 x3 x2:x3
0.6763 -0.9218 -1.2728 0.2703
All variables are retained. (This is OLS regression, not logistic regression, but that doesn't matter: both methods behave identically concerning treatment of collinear independent variables.)
In the second case, let's set the first two first elements of $X_3$ to the same value:
> x3[1] <- x3[2]; lm(y ~ x2*x3)
Coefficients:
(Intercept) x2 x3 x2:x3
0.6763 -0.4745 -1.2728 NA
The interaction is dropped due to the collinearity.
In the third case, let's set the last two elements of $X_3$ to a common value while varying the first two. To do this, I just reverse all the element of $X_3$:
> x3 <- rev(x3); lm(y ~ x2*x3)
Coefficients:
(Intercept) x2 x3 x2:x3
1.217 -1.756 -1.605 NA
Once again the interaction is dropped due to collinearity.
It sounds like SPSS behaves in the same way as R
in such cases.
Best Answer
Your understanding of confounding and collinearity is correct. Note that in many contexts collinearity really refers to "perfect collinearity" where one variable is a linear combination of one or more other variables, but in some contexts it just refers to "high correlation" between variables.
Of course, in order for confounding to occur, there has to be a degree of correlation, though I would avoid saying "collinearity" due to the above.
However:
A "joint effect" is a good way to undersdand it, but in no way does it require correlation between the variables. Consider an orthogonal factorial design experiment for example.
As another example we could also show this with a simple simulation of bivariate data where
X1
andX2
are uncorrelated yet a meaningful interaction exists: