Solved – Intercept in demeaned and rescaled regression model

regressionregression coefficients

Suppose I have a linear model;

$Y=X\beta+\epsilon$

Where $X$ is $(n \times p)$, with the first column of $X$ being an intercept column (consisting only of ones). Now suppose I construct $\tilde{X}$ by demeaning (i.e. removing the mean from each column) and rescaling (i.e. dividing by the column norm) all columns apart from the intercept column and now use the model:

$Y=\tilde{X}\tilde{\beta}+\epsilon$

My questions are:

  • Will $\hat{\beta}=\hat{\tilde{\beta}}$ when I use standard OLS? Will any of the $\beta$ values be the same and why?
  • If they are not the same, is there some way for me to convert $\hat{\beta}$ to $\hat{\tilde{\beta}}$ and vice versa?
  • Do I still need the intercept column in $\tilde{X}$ if all my other columns are demeaned?
  • Does the intercept still have the same interpretation for both $X$ and $\tilde{X}$, why/why not?

Thank you in advance!

Best Answer

Let's look at a very simple case, where $X$ consists only of the intercept column and one predictor $x$. In this case, your fits are

$$ \hat{y}_i = \hat{\beta}_0+\hat{\beta_x}x_i. $$

We want to express this in terms of the rescaled predictor $\tilde{x} := \frac{x-m}{s}$, where $m$ and $s$ are the mean and column norm of $x$ (or in fact, any other shifting and scaling constant), and the accompanying updated regression coefficients,

$$ \hat{y}_i = \hat{\tilde{\beta}}_0+\hat{\tilde{\beta_x}}\tilde{x}_i. $$

(Note that the fits will be identical, since both regressions minimize the sums of squared residuals.)

A substitution yields

$$ \hat{y}_i = \hat{\beta}_0+\hat{\beta_x}x_i = \hat{\beta}_0+\hat{\beta_x}(s\tilde{x}_i+m) = \hat{\beta}_0+m\hat{\beta}_x+s\hat{\beta}_x\tilde{x}_i. $$

We obtain a relationship between the original and the updated regression coefficients:

$$ \hat{\tilde{\beta}}_0 = \hat{\beta}_0+m\hat{\beta}_x\qquad\text{and}\qquad \hat{\tilde{\beta}}_x = s\hat{\beta}_x.$$

So, to answer your questions:

  • Will $\hat{\beta}=\hat{\tilde{\beta}}$ when I use standard OLS? Will any of the $\beta$ values be the same and why?

    No to both, as per above.

  • If they are not the same, is there some way for me to convert $\hat{\beta}$ to $\hat{\tilde{\beta}}$ and vice versa?

    You can solve the equations above to convert the regression coefficients. The case $p>1$ should be similar.

  • Do I still need the intercept column in $\tilde{X}$ if all my other columns are demeaned?

    Yes, unless you also demean your response $Y$. See below.

  • Does the intercept still have the same interpretation for both $X$ and $\tilde{X}$, why/why not?

    The interpretation is similar. The intercept is the mean value of the response if the predictor is zero. However, "the predictor" is the original predictor $x$ in the original model, and the demeaned and rescaled predictor $\tilde{x}$ in the updated model. Since the two are different, their being zero is also different, and so is the response variable if the predictor is zero. Which is why the intercept changes between the two models. (And per above, if you demean the intercept, then its value is zero if all predictors are zero, in which case the intercept coefficient will be estimated to be zero.)

Related Question