Solved – What does the additive assumption mean

assumptionslinear modelregression

The additive assumption means the effect of changes in a predictor on a response is independent of the effect(s) of changes in other predictor(s).

However, with regression, say, with one continuous predictor and a continuous response, the addition of an additional predictor variables can impact the relationship between a predictor and the response.

My naive interpretation of additivity would suggest that the addition of additional predictors does not change the relationship between, say, the first predictor and the response.

Why is this not the case? What exactly does the additive assumption mean?

Best Answer

I think you're conflating two different concepts here.

Additive effects in linear models

Linear regression assumes that the impact of different covariates is additive, so a simple linear model (for two predictors and a response) would look like this:

$$ y = \alpha + \beta_1 x_1 + \beta_2 x_2 + \epsilon$$

As an example, let's say you changed the value of $x_2$ by adding 1, such that $\tilde{x}_2 = x_2 + 1$, then you would have:

$$ \begin{aligned} \tilde{y} &= \alpha + \beta_1 x_1 + \beta_2 \tilde{x}_2 + \epsilon \\ &= \alpha + \beta_1x_1 + \beta_2 (x_2 + 1) + \epsilon \\ &= (\alpha + \beta_1 x_1 + \beta_2 x_2 + \epsilon) + \beta_2 \\ &= y + \beta_2 \end{aligned}$$

So you can see pretty clearly, in a linear model, that it doesn't matter what value $x_1$ takes - the effect of an incremental change in $x_2$ is the same

Adding additional predictors

This is very different from a situation where you were to add a new predictor to your model! When you add a new predictor, it is quite possible that the other coefficients will change.

Related Solutions

Solved – additive and non-additive(multiplicative) interactions – soft question –

we use models with multiplicative interaction effects when relationship between independent variable and dependent variable are non-additive.

My question is, Are all models with multiplicative interaction effects non-linear? and all models with additive interaction effects linear?

The answer to such a question depends on what you mean when you say 'linear' and 'nonlinear', and what domain of models you're restricting yourself to.

Usually the terms 'linear' and 'nonlinear' in statistical models refers to linearity in the parameters, not the variables.

So for example, $y = \alpha x^2 +\epsilon$ is linear in $\alpha$ though not in $x$, while $y = \exp(-\alpha) x +\epsilon$ is non-linear in $\alpha$, though it is in $x$. In usual parlance, the first is a linear model and the second is not. However, in those cases at least both may be turned into models that are linear in both the parameters and the predictors - in the first case by the transformation $x^* = x^2$, giving a model that has a linear relationship between $y$ and $x^*$, and in the second case by the reparameterization $\alpha^* = \exp(-\alpha)$.

As such a standard general linear model (regression-type model) with multiplicative interaction is linear in the parameters, even though it's not linear in either predictor (IV). However, note that even in terms of the IVs, it is conditionally linear - fix one of the IVs and the relationship is linear in the other.

[Minor mathematical aside: It should be noted that when we're taking about the relationships of $y$ and some $x$ being linear (in this sense rather than the 'makes a straight-line' sense), if we recognize we're using homogeneous co-ordinates in regression, it is linear. I mention it because I have seen people with enough mathematical background to be familiar with the mathematical definition of linearity object that 'linear regression is not linear'.]

all models with additive interaction effects linear?

If I understand what you're even asking with 'additive interaction effects', there's really no such thing. If it's additive it's already in the main effects and there's nothing left over for some notional 'interaction'.

Also, With non-linearity, the effect of independent variable on dependent variable depends on the value of independent variable,

Only if you think of 'effect' as inherently linear

in effect, independent variable somehow interacts with itself.

This way lies much confusion. Why not just think of there being a relationship that's described by some curve rather than by a straight line?

Edit to address followup questions:

What do you mean when you say "what domain of models you're restricting yourself to"?

When you said "all models with multiplicative interaction effects" you presumably meant 'all models' in some class, such as regression models, or general linear models, or generalized linear models, or ... the list could go on for some time.

Thanks! for noting about linearity. For the longest time, even I thought being linear meant the relationship was a straight line.

Me too.

This does clear some doubts, but raises a few questions. So, if we recognize we're using homogeneous co-ordinates in regression, it is linear.

In terms of $x$'s - the actual columns of the $X$-matrix in a regression - it's linear in that linked mathematical sense if you realize you're working with homogeneous co-ordinates.

A multiple linear regression is already linear in the mean parameters (i.e. the $\beta$ vector, the parameters other than $\sigma^2$), without any such need to invoke homogeneous co-ordinates. That I was referring to the relationship with the $x$'s when I raised homogeneous co-ordinates was explicitly stated.

Also, Did you mean to say "Only if you think of 'effect' as inherently nonlinear instead of linear?

Nope. The way you phrased the question I was responding to only makes sense if you take the word 'effect' to imply linearity, otherwise the whole notion of 'interaction with itself' seems to be utterly meaningless. How is one to interpret the phrase?

What I meant to ask was. I read somewhere that "with non-linearity, the effect of X on Y depends on the value of X and X somehow interacts with itself".

I regard the statement as an unhelpful attempt at analogy, and, as already explained, I think you should not think about it this way. Not everything that someone writes down is useful.

Does this mean that X interacts with itself(X)? or does it mean that X interact with other variables(X,W etc.) if any?

I'm not going to make any further attempt to interpret something that doesn't really make sense as a bare, general, statement without first having more clarification of its intent. I've suggested a way to interpret it that makes at least a little sense. If you want to interpret it more generally, explaining what it means would be up to you - or the original author of it.

I expect if you were to ask "what, exactly, does it mean*, you would receive an answer that contained a number of hidden premises, and one of those premises would rely, directly or indirectly, on taking underlying meaning of 'effects' to be linear, when we have no good reason to do that.

Regression – Does Normal Errors Assumption Imply Y is Also Normal?

The standard OLS model is $Y = X \beta + \varepsilon$ with $\varepsilon \sim \mathcal N(\vec 0, \sigma^2 I_n)$ for a fixed $X \in \mathbb R^{n \times p}$.

This does indeed mean that $Y|\{X, \beta, \sigma^2\} \sim \mathcal N(X\beta, \sigma^2 I_n)$, although this is a consequence of our assumption on the distribution of $\varepsilon$, rather than actually being the assumption. Also keep in mind that I'm talking about the conditional distribution of $Y$, not the marginal distribution of $Y$. I'm focusing on the conditional distribution because I think that's what you're really asking about.

I think the part that is confusing is that this doesn't mean that a histogram of $Y$ will look normal. We are saying that the entire vector $Y$ is a single draw from a multivariate normal distribution where each element has a potentially different mean $E(Y_i|X_i) = X_i^T\beta$. This is not the same as being an iid normal sample. The errors $\varepsilon$ actually are an iid sample so a histogram of them would look normal (and that's why we do a QQ plot of the residuals, not the response).

Here's an example: suppose we are measuring height $H$ for a sample of 6th graders and 12th graders. Our model is $H_i = \beta_0 + \beta_1I(\text{12th grader}) + \varepsilon_i$ with $\varepsilon_i \sim \ \text{iid} \ \mathcal N(0, \sigma^2)$. If we look at a histogram of the $H_i$ we'll probably see a bimodal distribution, with one peak for 6th graders and one peak for 12th graders, but that doesn't represent a violation of our assumptions.