Solved – Closed form and gradient calculation for linear regression

linear algebramathematical-statisticsregression

Given is a linear regression problem, where we have one training point, which is 1-dimensional: $x \in R_{>0}$ and the corresponding output, $y \in R$. We duplicate the feature, such that we have one training point with two (identical) features.

For this, we have to determine if we can apply the closed form solution $\beta = (X^TX)^-1*X^T*y$. Then we have to solve the linear regression problem by taking into account that $f(X) = ||y-X*\beta||^2$ is convex.

  • Application of the closed form solution: For this I want to determine if $X^TX$ has full rank. Given is $X = (1,x_{11},x_{12})$. Hence $X^T*X$ results in:

    \begin{bmatrix}
    1 & x_{11} & x_{12} \\
    x_{11} & x_{11}^2 & x_{11}x_{12} \\
    x_{12} & x_{11}x_{12} & x_{12}^2
    \end{bmatrix}

This matrix seems to have full rank (independent column vectors), so the closed form solution is applicable. Correct? Is there another way to determine the rank without explicitly calculating $X*X^T$?

  • For minimizing $f(\beta)$ I would derive $||y-X*\beta||^2$ and set it equal to zero.

Best Answer

There are some incorrect statements in your question.

(Notation note: I'll use bold letters to denote vectors.)

Let $\mathbf{b}$ be a coefficient vector we're trying to estimate. The ordinary least squares problem is:

\begin{equation} \begin{array}{*2{>{\displaystyle}r}} \mbox{minimize (over $\mathbf{b}$)} & \sum_{i=1}^n \epsilon_i^2 \\ \mbox{subject to} & y_i = \mathbf{x}_i \cdot \mathbf{b} + \epsilon_i \end{array} \end{equation}

This can be rewritten in matrix notation as:

\begin{equation} \begin{array}{*2{>{\displaystyle}r}} \mbox{minimize (over $\mathbf{b}$)} & \left(\mathbf{y} - X \mathbf{b}\right)'\left(\mathbf{y} - X \mathbf{b}\right) \end{array} \end{equation}

This is an unconstrained, convex optimization problem, and gradient being equal to zero is a necessary and sufficient condition for an optimum. You can show the first order condition is:

$$(X'X) \mathbf{b} = X'\mathbf{y} $$

Any $\mathbf{b}$ that satisfies that equation will be an optimum. In the case where $X'X$ is full rank, $X'X$ can be inverted to obtain the unique solution:

$$ \mathbf{b} = (X'X)^{-1}X' \mathbf{y}$$

If $X'X$ is not full rank, then you cannot use that formula! In this case, there isn't a unique solution. Instead, there is a linear subspace of solutions that yield a sum of squares of zero.

Other notes:

  • $\operatorname{Rank}(X) = \operatorname{Rank}(X'X)$. See here.
  • The case where $X'X$ is rank deficient or close to rank deficient is known as multicollinearity.

How I interpret the first part of your question?

Given is a linear regression problem, where we have one training point, which is 1-dimensional: $x \in \mathbb{R} $ and the corresponding output, $ y \in \mathbb{R}$. We duplicate the feature, such that we have one training point with two (identical) features.

I interpret this as if you have one training observation $(y, x, x)$. It doesn't even say that an intercept is included, and for simplicity, I won't include one. The design matrix is: $$ X = \begin{bmatrix} x & x \end{bmatrix}$$

This is obviously rank 1. Hence $X'X$ is also rank one because $\operatorname{Rank}(X) = \operatorname{Rank}(X'X)$.

$$ X'X = \begin{bmatrix}x^2 & x^2 \\ x^2 & x^2 \end{bmatrix}$$ That matrix has one linearly independent row or column. The matrix is not invertible.

Any vector $\mathbf{b}$ which solves: $$ \begin{bmatrix} x^2 & x^2 \\ x^2 & x^2\end{bmatrix} \begin{bmatrix} b_1 \\ b_2 \end{bmatrix} = \begin{bmatrix} xy \\ xy \end{bmatrix} $$ will gives a solution. So as long as $b_1 + b_2 = \frac{y}{x}$, the sum of squared error will be zero.