Left inverse of a matrix and a full column rank

inverselinear algebramatricesmatrix-rank

Dr Strang in his book linear algebra and it's applications, pg 108 says ,when talking about the left inverse of a matrix( $m$ by $n$)

UNIQUENESS: For a full column rank $r=n . A x=b$ has at most one solution $x$ for every $b$ if and only if the columns are linearly independent. Then $A$ has an $n$ by $m$ left-inverse $B$ such that $B A=I_{n}$. This is possible only if $m \geq n$.

I understand why there can be at most one solution for a full column rank but how does that lead to $A$ having a left inverse?

I'd be grateful if someone could help or hint at the answer.

Best Answer

If $A$ is $m \times n$, then the following are equivalent:

  1. $A$ has full column rank $n$
  2. The columns of $A$ are linearly independent
  3. The null space of $A$ is trivial
  4. The map induced by $A$ is injective
  5. $A$ has a left inverse

Proof that 1 $\iff$ 2:

Immediate from the definition of column rank.

Proof that 2 $\iff$ 3:

Observe that the vector $Ax$ is equal to the linear combination $\sum_{i=1}^{n}a_i x_i$, where $a_i$ is the $i$'th column of $A$, and $x_i$ is the $i$'th component of $x$.

In particular, $Ax = 0$ if and only if $\sum_{i=1}^{n}a_i x_i = 0$.

The null space of $A$ is trivial if and only if $x=0$ is the only solution to $Ax = 0$ which, by what we said above, is true if and only if $\sum_{i=1}^{n}a_i x_i = 0$ implies $x_i = 0$ for all $i$, which is true if and only if $a_1, a_2, \ldots, a_n$ are linearly independent.

Proof that 3 $\iff$ 4:

Suppose that $Ax = Ay$. Since $A$ is linear, this is equivalent to $Ax - Ay = A(x-y) = 0$. Therefore $x-y$ is in the null space of $A$. But the null space of $A$ is trivial, hence $x-y = 0$, so $x=y$. This shows that (the map induced by) $A$ is injective (one-to-one).

Conversely, suppose that $A$ is injective. Then $x=0$ is the unique vector such that $Ax = 0$. Therefore the null space of $A$ is trivial.

Proof that 2 (and equivalently 4) $\implies$ 5:

Let $e_1, e_2, \ldots, e_n$ be the canonical basis for $\mathbb R^n$, meaning that $e_i$ has a $1$ in the $i$'th component, and zeros everywhere else. Note that for each $i$ we have $a_i = Ae_i$, where again $a_i$ is the $i$th column of $A$. Moreover, since $A$ is injective, $e_i$ is the unique vector that is mapped by $A$ to $a_i$.

Now, since $a_1, a_2, \ldots, a_n$ are linearly independent, they are a basis for the column space of $A$, which can be extended to a basis $a_1,a_2,\ldots, a_n, b_1,b_2,\ldots,b_{m-n}$ for $\mathbb R^m$. Hence an arbitrary $y \in \mathbb R^m$ has a unique representation of the form $y = \sum_{i=1}^{n} c_i a_i + \sum_{j=1}^{m-n} d_j b_j$ where $c_i$ and $d_j$ are scalars.

Therefore we can define a linear map $g : \mathbb R^m \to \mathbb R^n$ by first setting $g(a_i) = e_i$ for each $i=1,2,\ldots,n$ and $g(b_j) = 0$ for each $j=1,2,\ldots,m-n$, and then extending $g$ linearly to all of $\mathbb R^m$:

$$g(y) = g\left(\sum_{i=1}^{n} c_i a_i + \sum_{j=1}^{m-n} d_j b_j \right) = \sum_{i=1}^{n} c_i g(a_i) + \sum_{j=1}^{m-n} d_j g(b_j) = \sum_{i=1}^{n} c_i g(a_i) = \sum_{i=1}^{n} c_i e_i$$

Then $g$ is a left inverse of $A$:

$$g(Ax) = g\left(\sum_{i=1}^{n}a_i x_i\right) = \sum_{i=1}^{n} x_i g(a_i) = \sum_{i=1}^{n} x_i e_i = x$$

Proof that 5 $\implies$ 3:

Suppose that $Ax = 0$. Let $g$ be a left inverse of $A$. Then $x = g(Ax) = 0$. This shows that the null space of $A$ is trivial.


As a side note, it turns out that 4 and 5 are equivalent for general functions, not just linear maps. If $f$ is any injective function, then it has a left inverse, and conversely if $f$ is any function that has a left inverse, then it is injective. There is a proof here, for example. Since you indicated in the comments that this is an unfamiliar fact, I did not use it in the proof above but instead constructed a left inverse explicitly.


Note that my proof shows why a left inverse of $A$ must exist if $A$ has full column rank, but it doesn't explicitly show how to compute the left inverse.

As Strang notes, one formula for a left inverse is $B = (A^T A)^{-1} A^T$. That this is a left inverse is clear by computing:

$$BA = ((A^T A)^{-1} A^T) A = (A^T A)^{-1} (A^T A) = I_n$$

But as you will have noted, Strang punts to a later chapter the proof that $A^T A$ is invertible when $A$ has full column rank. So that's not very satisfactory!

Also, computing Strang's left inverse is very inefficient because it involves inverting $A^T A$. This requires a lot of calculation, proportional to $n^3$ operations for an $n \times n$ matrix.

In practice, probably the best way to compute a left inverse is to perform row reduction on $A$ to bring it to the form

$$\begin{bmatrix} I_n \\ 0_{m-n \times n} \end{bmatrix}$$

where $I_n$ is the $n \times n$ identity matrix, and $0_{m-n \times n}$ is the $m - n \times n$ matrix consisting of all zeros. Row reduction to this form is possible if and only if the columns of $A$ are linearly independent.

Assuming you're familiar with row reduction, you probably know that each row operation can be expressed as an $m \times m$ elementary matrix of one of three forms, corresponding to the three row reduction operations (multiplying a row by a scalar, interchanging two rows, and adding a scalar multiple of one row to another). The row reduction procedure can then be expressed by left-multiplying $A$ by the corresponding elementary matrices. Assuming there are $k$ of these, we have:

$$E_k E_{k-1} \cdots E_2 E_1 A = \begin{bmatrix} I_n \\ 0_{m-n \times n} \end{bmatrix}$$

The product $E_k E_{k-1} \cdots E_2 E_1$ is easy to understand conceptually: it corresponds to the $k$ row operations used to bring $A$ into the reduced form. Fortunately, it's not necessary to compute $E_k E_{k-1} \cdots E_2 E_1$ as a product of $k$ matrices! Instead you compute it by starting with $I_{m}$ and performing the same row operations on it as you perform on $A$.

In any case, denoting $E_k E_{k-1} \cdots E_2 E_1$ by $B$, the above becomes

$$BA = \begin{bmatrix} I_n \\ 0_{m-n \times n} \end{bmatrix}$$

Note that $B$ is an $m \times m$ matrix. It is almost the left inverse we seek, except we want just $I_n$ on the right hand side and a left inverse should be $n \times m$, not $m \times m$. If $m > n$ then the right hand side has $m-n$ spare rows of zeros at the bottom. To get rid of these, we can simply remove the bottom $m-n$ rows of $B$ to get a $n \times m$ matrix $B'$ which satisfies $B'A = I_n$ and is therefore a left inverse of $A$, as desired!