Solved – the difference between “controlling for a variable” and interaction

interactionmultiple regression

I've always believed that multiple regression was to perform what we often call "controlling for a variable". So if I run a multiple regression with height as the dependent variable and weight and amount of soda consumed in dependent variables, I would get results for both weight and soda consumed, controlling for each other. So if weight was significantly associated with height, it would be so regardless of whether there were changes in soda consumption.

However, I then read about interactions which are often used with margins as well and in a tutorial I have watched they explain interaction exactly this way. That it is used to control for variables.

I then ask you, what is the difference?

Best Answer

It makes more sense to say that someone becomes heavier if one is taller and/or consumes more soda than that someone becomes taller if (s)he is heavier and consumes more soda. So I assume you mean that the dependent/explained/left-hand-side/y-variable is weight and the independent/explanatory/right-hand-side/x-variables are height and soda consumption. For this example assume that tall people tend to drink more sodas.

So the model while only controlling for sodas is:

$\widehat{weight}= b_0 + b_1 height + b_2 soda$

While the model with the interaction effect is:

$\widehat{weight}= b_0 + b_1 height + b_2 soda + b_3 height \times soda$

If you control for soda use you are comparing people of different height but with the same soda use, that is, you keep the control variables constant. If we had not controlled for soda, then part of the effect of height would actually be the result of tall people drinking more sodas, and those who drink more sodas tend to be heavier. Controlling for soda, means that we filter this part out by keeping the soda consumption constant. However, there is only one effect of height on weight. Regardless of your soda consumption, you will on average gain $b_1$ grams for every centimeter you get taller.

If you add an interaction effect, you say that the effect of height differs depending on your soda consumption. If we treat both variables linearly we would get something like, the effect of height on weight is $b_1+b_3\times soda$. So if one does not drink soda at all, i.e. $soda$ is 0, then you will gain on average $b_1$ grams for every centimeter you get taller. However, if you drink 10 sodas a day, then you will get $b_1 + 10\times b_3$ grams for every centimeter you get taller. These different effects of height on weight are also controlled for soda. In the first case the soda consumption is kept constant at 0, while in the second case the soda consumption is kept constant at 10.

Best Answer

Related Solutions

Multiple Regression – Difference Between Controlling for and Ignoring Variables

Solved – Controlling for age in multiple regression

Related Question