Solved – Calculation of intercept in multiple linear regression (OLS)

least squaresmultiple regression

While researching OLS, I found out the equation to calculate coefficients as:

$$
\beta = (X^\top X)^{-1}X^\top y
$$

(Ref: https://en.wikipedia.org/wiki/Linear_least_squares)

However it does not explicitly mention how to calculate the intercept. So what is the equation to calculate it?

Best Answer

You can obtain the solution for the intercept by setting the partial derivative of the squared loss with respect to the intercept $\beta_0$ to zero. Let $\beta_0 \in \mathbb{R}$ denote the intercept, $\beta \in \mathbb{R}^d$ the coefficients of features, and $x_i \in \mathbb{R}^d$ the feature vector of the $i$-th sample. We have to solve

\begin{align} \arg\min_{\beta_0} \quad& \mathcal{L}(\beta_0, \beta) \\ \mathcal{L}(\beta_0, \beta) &= \frac{1}{2} \sum_{i=1}^n (y_i - \beta_0 - x_i^\top \beta)^2 \\ \frac{\partial}{\partial \beta_0} \mathcal{L}(\beta_0, \beta) &= -\sum_{i=1}^n (y_i - \beta_0 - x_i^\top \beta) = 0 \end{align}

All we have to do is to solve for $\beta_0$:

\begin{align} \sum_{i=1}^n \beta_0 &= \sum_{i=1}^n (y_i - x_i^\top \beta) \\ \beta_0 &= \frac{1}{n} \sum_{i=1}^n (y_i - x_i^\top \beta) \end{align}

Usually, we assume that all features are centered, i.e., $$\frac{1}{n} \sum_{i=1}^n x_{ij} = 0 \qquad \forall j \in \{1,\ldots,d\},$$ which simplifies the solution for $\beta_0$ to be the average response: \begin{align} \beta_0 &= \frac{1}{n} \sum_{i=1}^n y_i - \frac{1}{n} \sum_{i=1}^n \sum_{j=1}^d x_{ij} \beta_j \\ &= \frac{1}{n} \sum_{i=1}^n y_i - \sum_{j=1}^d \beta_j \frac{1}{n} \sum_{i=1}^n x_{ij} \\ &= \frac{1}{n} \sum_{i=1}^n y_i \end{align}

If in addition, we also assume that the response $y$ is centered, i.e., $\frac{1}{n} \sum_{i=1}^n y_i = 0$, the intercept is zero and thus eliminated.

Related Question