Understanding Artin’s derivation of the determinant formula

linear algebraproof-explanation

I'm trying to understand section 1.4 of Artin. I'll mention some context so that the notation he's using is clear, even though it makes sense to me.

He defines a permutation matrix $P$ associated with permutation $p$. Then if $e_{ij}$ are "matrix units," matrices with a $ 1$ in the $ij$th place and $0$'s elsewhere, we have
$$P = e_{p(1)1} + \ldots + e_{p(n), n}.$$
So far, I understand. Then he introduces linearity of the determinant. In the $2 \times 2$ case, it gives:
\begin{align*}
\det \begin{pmatrix} a & b \\ c & d \end{pmatrix} = \det \begin{pmatrix} a & 0 \\ c & 0 \end{pmatrix} + \det \begin{pmatrix} a & 0 \\ 0 & d \end{pmatrix} + \det \begin{pmatrix} 0 & b \\ c & 0 \end{pmatrix} + \det \begin{pmatrix} 0 & b \\ 0 & d \end{pmatrix}.
\end{align*}
The first and fourth matrices have a column of $0$'s, so there determinant is $0$. So far so good. He calls the remaining matrices $M$ and comments that they have exactly one entry left in each row and each column. I can see this in the $2 \times 2$ case, but don't understand why it is true, generally, though it must be obvious from linearity of the determinant.

Assuming this is true about $M$, $M$ is basically a permutation matrix, but instead of entries of $1$, they're entries $a_{ij}$ coming from some matrix $A$ whose determinant we're taking, so we have
$$M = \sum\limits_{j} a_{p(j)j} e_{p(j)j}.$$
He then states that by linearity, we have:
\begin{align*}
\det M & = (a_{p(1)1} \cdots a_{p(n)n}) \det P \\
& = (\text{sign $p$})(a_{p(1)1} \cdots a_{p(n)n}.
\end{align*}
I don't understand where either of these lines come from, and this may be my greatest source of confusion. I tried to work out a few examples by hand, but it wasn't particularly helpful.

The next observation is there is "one such term for each. permutation," so we can sum over them:
\begin{align*}
\det A & = \sum\limits_{\text{perm $p$}} (\text{sign $p$}) a_{p(1)1} \cdots a_{p(n)n}) \\
& = \sum\limits_{\text{perm $p$}} (\text{sign $p$}) a_{1p(1)} \cdots a_{np(n)})
\end{align*}
By one term for each. permutation, concretely, I assume he means that if we're looking at an $n \times n$ matrix, there are $n!$ permutations of its columns, so there are $n$ matrices $M$ (two above) and we want to sum over each of those. I believe I understand that, but I don't understand where the second, "transpose form" comes from or why it's equivalent. Is this using the fact thatthe determinant of a matrix equals the determinant of its transpose? Even if it's using that fact, I don't understand why the indices shift in this way.

I would appreciate if someone could help to clarify these.

Best Answer

Let's observe that a determinant is a multilinear function of the rows (or columns) of a matrix.

Let's use the above fact on a 2x2 matrix $M$ given by $$M=\begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\end{bmatrix} $$ Also let's note that the unit row vectors can be written as $$e_1=(1,0),e_2=(0,1)$$ and we can write a typical row vector $r=(a, b) $ as $$r=ae_1+be_2$$ Next we can write $$\operatorname {det} (M) =\operatorname {det} \begin{bmatrix} a_{11}e_1+a_{12}e_2\\a_{21}e_1+a_{22}e_2 \end{bmatrix} $$ which further by multilinearity equals $$\operatorname {det} \begin{bmatrix} a_{11}e_1\\ a_{21}e_1 + a_{22}e_2\end{bmatrix}+\operatorname {det} \begin{bmatrix} a_{12}e_2\\ a_{21}e_1+ a_{22}e_2\end{bmatrix} $$ This can be further split as $$\operatorname {det} \begin{bmatrix} a_{11}e_1\\ a_{21} e_1\end{bmatrix}+\operatorname {det} \begin{bmatrix} a_{11}e_1 \\ a_{22}e_2 \end{bmatrix}+\operatorname {det} \begin{bmatrix} a_{12}e_2 \\ a_{21}e_1\end{bmatrix}+\operatorname {det} \begin{bmatrix} a_{12}e_2\\ a_{22}e_2\end{bmatrix} $$ And twe can take out the matrix entries to get $$\operatorname {det} (M) =a_{11}a_{21}\operatorname {det} \begin{bmatrix} e_1\\ e_1\end{bmatrix}+a_{11}a_{22}\operatorname {det} \begin{bmatrix} e_1 \\ e_2 \end{bmatrix}+a_{12}a_{21}\operatorname {det} \begin{bmatrix} e_2 \\ e_1\end{bmatrix}+a_{12}a_{22}\operatorname {det} \begin{bmatrix} e_2\\ e_2\end{bmatrix} $$ Let's clearly notice that the determinants with at least two equal rows vanish and hence we are left with $$\operatorname {det} (M) = a_{11}a_{22}\operatorname {det} \begin{bmatrix} e_1 \\ e_2 \end{bmatrix}+a_{12}a_{21}\operatorname {det} \begin{bmatrix} e_2 \\ e_1\end{bmatrix}$$ You will thus note that we are left with only those entries which have rows with a permutation of unit row vectors (this happens in exactly the same manner with $n\times n$ matrices and we are left with exactly $n! $ determinants).

If $\sigma$ is a permutation on $n$ symbols (more formally $\sigma$ is a bijection from set $\{1,2,\dots,n\}$ to itself) then we define the sign of permutation $\sigma$ as $$\operatorname {sign} (\sigma) =\operatorname {det} \begin{bmatrix} e_{\sigma(1)}\\e_{\sigma(2)}\\ \dots\\e_{\sigma(n)} \end{bmatrix} $$ The above matrix can always be transformed into the indetity matrix $[e_1,e_2,\dots, e_n] ^{T} $ via a finite number of row exchanges and hence the sign of a permutation $\sigma$ is $(-1)^j$ where $j$ is the number of transpositions which can be applied on $\sigma$ to reduce it to identity permutation (prove that there can be multiple values of $j$ but with same parity so that $(-1)^j$ remains fixed). Using this definition one can show that $\operatorname {sign} (\sigma) =\operatorname {sign} (\sigma^{-1})$.

Coming back to our computation with the $2\times 2$ matrix $M$ we see that $$\operatorname {det} (M) =\sum_{\sigma}\operatorname {sign} (\sigma) a_{1\sigma(1)}a_{2\sigma(2)}=a_{11}a_{22}-a_{12}a_{21}$$ A similar formula holds for matrices of order $n\times n$ and if $M=[a_{ij}] _{n\times n} $ is a square matrix of order $n$ with entries $a_{ij} $ in $i$-th row and $j$-th column then we have $$\operatorname {det} (M) =\sum_{\sigma} \operatorname {sign} (\sigma) a_{1\sigma(1)}a_{2\sigma(2)}\dots a_{n\sigma(n)} $$ where $\sigma$ runs over all permutations on symbols $1,2,\dots,n$.

So you can see that we use the following properties of a determinant to prove the desired formula:

it is linear function of each row (multi-linearity)
it is $0$ if two rows are same (this is a consequence of the fact that determinant changes sign if two rows are exchanged so that it is an alternating function of row vectors).
it takes the value $1$ on identity matrix.

The formula above in terms of permutations proves that any such function with those three properties is unique and thus determines the determinant (!!) uniquely.

Next if $\tau$ is the inverse of permutation $\sigma$ and $\sigma(i) =j$ then $\tau(j) =i$ and thus $a_{i\sigma(i)} =a_{\tau(j) j} $. Let $M=[a_{ij}]_{n\times n} $ and $M^{T} =[b_{ij}] _{n\times n} $ be the transpose of $M$ so that $b_{ij} =a_{ji} $.

Then $$\operatorname {det} (M) =\sum_{\sigma} \operatorname {sign} (\sigma) a_{1\sigma(1)}a_{2\sigma(2)}\dots a_{n\sigma(n)} $$ We can now rewrite each term in above sum with order of factors such that the second index of $a$ is in order $1,2,\dots,n$ so that $$a_{1\sigma(1)}a_{2\sigma(2)}\dots a_{n\sigma(n)}=a_{\tau(1)1}a_{\tau(2)2}\dots a_{\tau(n)n}$$ As $\sigma$ runs over all permutations on $n$ symbols so does its inverse $\tau$ and hence we get $$\operatorname {det} (M) =\sum_{\tau}\operatorname {sign} (\tau) a_{\tau(1)1}\dots a_{\tau(n) n} =\sum_{\tau} \operatorname {sign} (\tau) b_{1\tau(1)}\dots b_{n\tau(n)} =\operatorname {det} (M^T) $$

The above procedure also proves that if $f$ is any alternating and multi-linear function of the rows of $n\times n$ matrices then $$f(M) =(\operatorname {det} (M)) f(I) $$ for any square matrix $M$ of order $n$ and $I$ being the identity matrix of order $n$.

Best Answer

Related Solutions

[Math] Understanding the derivation of the general formula for determinant

Proof for the determinant of a Cauchy matrix

Related Question