Solved – Correlations between explanatory variables in regression

ancovalinear modelmultiple regressionregression

I'm just starting to learn about linear regression models and time series analysis and came upon the following doubt.

Suppose we have a variable $Y$ that we're trying to model using $p$ explanatory variables $X_i$, maybe using a simple linear model such as:

$$Y = a_0 + a_1X_1 +…+a_pX_p$$

My question is the following: what happens if the explanatory variables are correlated with each other. In the assumptions for this model, I see no mention of this fact, but clearly there has to be some kind of qualitative difference depending on the degree to which the variables are correlated with each other. For example if $X_1 = X_3 ^2$, then what is the point of including both $X_1$ and $X_3$ in the model, intuitively it seems like the other one carries no additional 'information'?

Perhaps depending on how correlated the variables are one would use a different model or approach?

Best Answer

If $$Y = a_0 + a_1X_1 + a_2X_2 +a_3X_3 +...+a_pX_p\;\&$$ $$X_1=X_3^2$$ then eliminate $X_1$ as it is unnecessary and regress $$Y = a_0 +a_3X_3 + a_1X_3^2 + a_2X_2 +...+a_pX_p$$ However, the example used for the question of how to treat correlated variables was not relevant as the answer to the example is based on the assumption of a perfect correlation, i.e., an equality, that lets us eliminate $X_1$ entirely as perfectly redundant.

Covariates in the analysis of covariance context, i.e., as per the ANCOVA procedure is as follows, assuming that a linear relationship between the response (DV) and covariate (CV) exists: $$y_{ij} = \mu + \tau_i + \beta (x_{ij} - \overline{x_i}) + \epsilon_{ij}.$$

In this equation, the DV, $y_{ij}$ is the $j^{th}$ observation under the $i^{th}$ categorical group; the CV, $x_{ij}$ is the $j^{th}$ observation of the covariate under the $i^{th}$ group. Variables in the model that are derived from the observed data are $\mu$ (the grand mean) and $\overline{x_i}$ (the $i^{th}$ group mean). The variables to be fitted are $\tau_i$ (the effect of the $i^{th}$ level of the IV), $\beta$ (the slope of the line) and $\epsilon_{ij}$ (the associated unobserved error term for the $j^{th}$ observation in the $i^{th}$ group).

This answer is only partial in the sense that the variations on the theme are numerous, and each circumstance requires special treatment.

Related Question