Using some dummy data similar to what you must have used:
set.seed(10)
dat <- data.frame(ART.conc = factor(sample(c("yes","no"), 10, replace = TRUE),
levels = c("yes","no")),
Parity = factor(sample(1:3, 10, replace = TRUE)))
and for simplicity, form the interaction variable and store it in dat
:
dat <- transform(dat, interact = interaction(ART.conc, Parity))
OK, now the model.matrix
will show us the conversion to terms in the model the represent the factor variables. First, the correct way:
> model.matrix( ~ ART.conc * Parity, data = dat)
(Intercept) ART.concno Parity2 Parity3 ART.concno:Parity2 ART.concno:Parity3
1 1 1 1 0 1 0
2 1 0 1 0 0 0
3 1 0 0 0 0 0
4 1 1 1 0 1 0
5 1 0 1 0 0 0
6 1 0 1 0 0 0
7 1 0 0 0 0 0
8 1 0 0 0 0 0
9 1 1 1 0 1 0
10 1 0 0 1 0 0
And now the way you did it:
> model.matrix( ~ interact + ART.conc + Parity, data = dat)
(Intercept) interactno.1 interactyes.2 interactno.2 interactyes.3
1 1 0 0 1 0
2 1 0 1 0 0
3 1 0 0 0 0
4 1 0 0 1 0
5 1 0 1 0 0
6 1 0 1 0 0
7 1 0 0 0 0
8 1 0 0 0 0
9 1 0 0 1 0
10 1 0 0 0 1
interactno.3 ART.concno Parity2 Parity3
1 0 1 1 0
2 0 0 1 0
3 0 0 0 0
4 0 1 1 0
5 0 0 1 0
6 0 0 1 0
7 0 0 0 0
8 0 0 0 0
9 0 1 1 0
10 0 0 0 1
The obvious thing is that these two are not the same because there are far more terms in the design matrix for the method using interaction()
than for the correct way.
Now, we also note why in the model with the interaction()
term you get NA
values for some coefficients. Compare the interactyes.3
column with the Parity3
column in the design matrix from the interaction()
model. Look familiar? So the interaction()
has created information that relates to the main effects terms you added to the formula alongside the interaction()
variable. At the very least, you don't need to specify the main effects terms alongside the interaction()
variable in that version of the model. However, it'd still be wrong even if you fixed that.
Consider
> with(dat, levels(interact))
[1] "yes.1" "no.1" "yes.2" "no.2" "yes.3" "no.3"
> with(dat, table(interact))
interact
yes.1 no.1 yes.2 no.2 yes.3 no.3
3 0 3 3 1 0
What interaction()
has done (by default, I'll add) is retain all possible combinations of the levels of the two factors; that's 6 even though we only observed 4 of them in this dummy data set. This shows up in the model matrix as a vector of 0s for both interactno.1
and interactno.3
if you just include the interaction()
term in the model:
> model.matrix( ~ interact, data = dat)
(Intercept) interactno.1 interactyes.2 interactno.2 interactyes.3
1 1 0 0 1 0
2 1 0 1 0 0
3 1 0 0 0 0
4 1 0 0 1 0
5 1 0 1 0 0
6 1 0 1 0 0
7 1 0 0 0 0
8 1 0 0 0 0
9 1 0 0 1 0
10 1 0 0 0 1
interactno.3
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
Hence we won't be able to identify both of two all 0 columns, so you'll still get an NA
.
In summ, the two ways you mention are not the same model and you'll cause yourself a world of pain if you try to fit interactions using the interaction()
function.
Hopefully by now you realize these forms aren't the same and why, and more importantly why it is a bad idea to circumvent R's formula interface notation.
Aside
I mentioned the default behaviour of interaction()
. You can do better by adding the drop = TRUE
to the call:
> with(dat, levels(interact2))
[1] "yes.1" "yes.2" "no.2" "yes.3"\
Now we only have the four observed levels. Then you can do:
> model.matrix( ~ ART.conc + Parity + interact2, data = dat)
(Intercept) ART.concno Parity2 Parity3 interact2yes.2 interact2no.2
1 1 1 1 0 0 1
2 1 0 1 0 1 0
3 1 0 0 0 0 0
4 1 1 1 0 0 1
5 1 0 1 0 1 0
6 1 0 1 0 1 0
7 1 0 0 0 0 0
8 1 0 0 0 0 0
9 1 1 1 0 0 1
10 1 0 0 1 0 0
interact2yes.3
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 1
But then this still isn't right because you have one more term than needed, again because R can't do the right thing about removing one of the combinations of levels which gets absorbed into the Intercept
term. Note the Parity3
and interact2yes.3
terms; they are coded the same way; the Parity3
term works with the Intercept
term to give you the information in interact2yes.3
.
You know that in a logit:
$$Pr[y = 1 \vert x,z] = p = \frac{\exp (\alpha + \beta \cdot \ln x + \gamma z)}{1+\exp (\alpha + \beta \cdot \ln x + \gamma z )}. $$
After some tedious calculus and simplification, the partial of that with respect to $x$ becomes:
$$ \frac{\partial Pr[y=1 \vert x,z]}{\partial x} = \frac{\beta}{x} \cdot p \cdot (1-p). $$
This is (sort of) equivalent to
$$\frac{\Delta p}{\Delta x}=\frac{\beta}{x} \cdot p \cdot (1-p),$$
which can be re-written as
$$\frac{\Delta p}{100 \cdot \frac{ \Delta x}{x}}= \frac{\beta \cdot p \cdot (1-p)}{100}.$$
This is the definition of semi-elasticity, and can be interpreted as the change in probability for a 1% change in $x$.
Here's an example in Stata.* Note that I am using margins
instead of the out-of-date mfx
to get the average marginal effect of $x$, $\frac{1}{N}\Sigma_{i=1}^N\frac{\beta \cdot p_i \cdot (1-p_i)}{100}$:
. sysuse auto, clear
(1978 Automobile Data)
. gen ln_price = ln(price)
. logit foreign ln_price mpg weight, nolog
Logistic regression Number of obs = 74
LR chi2(3) = 57.69
Prob > chi2 = 0.0000
Log likelihood = -16.185932 Pseudo R2 = 0.6406
------------------------------------------------------------------------------
foreign | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ln_price | 6.851215 2.11763 3.24 0.001 2.700737 11.00169
mpg | -.0880842 .1031317 -0.85 0.393 -.2902186 .1140503
weight | -.0062268 .0017269 -3.61 0.000 -.0096115 -.0028422
_cons | -41.32383 16.24003 -2.54 0.011 -73.15371 -9.493947
------------------------------------------------------------------------------
. margins, expression(_b[ln_price]*predict()*(1-predict())/100)
Predictive margins Number of obs = 74
Model VCE : OIM
Expression : _b[ln_price]*predict()*(1-predict())/100
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | .0046371 .0007965 5.82 0.000 .003076 .0061982
------------------------------------------------------------------------------
This means that for a 1% increase in price, the probability that a car is foreign increases by 0.005 on a [0,1] scale. Or a 10% increase in price gives you a 0.05 increase. In this date, about 0.3 of the cars are foreign, so these are economically and statistically significant.
Edit:
A good way to do this in Stata 10 is to install the user-written command margeff
:
. margeff, dydx(ln_price) replace
Average partial effects after margeff
y = Pr(foreign)
------------------------------------------------------------------------------
variable | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ln_price | .4637103 .0796514 5.82 0.000 .3075964 .6198241
mpg | -.0059616 .006781 -0.88 0.379 -.0192522 .007329
weight | -.0004214 .0000417 -10.11 0.000 -.0005031 -.0003398
------------------------------------------------------------------------------
. lincom _b[ln_price]/100
( 1) .01*ln_price = 0
------------------------------------------------------------------------------
variable | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | .0046371 .0007965 5.82 0.000 .003076 .0061982
------------------------------------------------------------------------------
*This is actually not a great empirical example since the relationship in the data has an inverted-U shape.
Best Answer
The interpretation you want to put on the covariate change makes sense in a simple model with one covariate. If you have two or more there could be some interaction and the best you can say in general is that x% is the magnitude of change when you change variable U by one standard deviation with the others held fixed at a particular value. At other places in the covariate space the magnitude of change could be different. If a variable is changed you can still talk about this amount of change on the log scale, but if you want to make the claim on the original scale you would have to figure out how the change on the log scale translates to a change on the original scale.