[Math] Understanding the derivation of the general formula for determinant

determinantlinear algebra

I'm reading about the derivation of the general formula for the determinant of matrix $T$ from Axler Sheldon's paper Down with Determinants (page 15):

We begin our search for a formula for the determinant by considering
matrices of a special form. Let $a_1, …, a_n \in \mathbb{C}$.
Consider a linear operator $T$ whose matrix is

$$\left[\begin{array}[ccccc] & 0 & & & & a_n\\ a_1 & 0 & & & \\
& a_2 & 0 & & \\ & & \ddots & \ddots & \\ & & & a_{n-1}& 0
\end{array}\right];\;\;\;\;\;\;\;\;\;\;(9.3)$$

here all entries of the matrix are $0$ except for the upper right-hand
corner and along the line just below the main diagonal. Let’s find the
determinant of $T$. Note that $T^n=a_1\cdots a_nI$. Because the first
columns of $\{I, T, …, T^{n-1}\}$ are linearly independent (assuming
that none of the $a_j$ is $0$), no polynomial of degree less than $n$
can annihilate (i.e. polynomial $q$ s.t. $q(T)=0$) $T$. Thus $z^n-a_1\cdots a_n$ is the minimal
polynomial of $T$. Hence $z^n-a_1\cdots a_n$ is also the
characteristic polynomial of $T$. Thus

$$\det T = (-1)^{n-1}a_1\cdots a_n.$$

(If some $a_j$ is $0$, then clearly $T$ is not invertible, so $\det T
= 0$, and the same formula holds.)

Now let $\tau$ be a permutation of $\{1 ,…,n\}$, and consider a matrix $T$ whose $j^{th}$ column
consists of all zeroes except for $a_j$ in the $\tau(j)^{th}$ row. The
permutation $\tau$ is a product of cyclic permutations. Thus $T$ is
similar to (and so has the same determinant as) a block diagonal
matrix where each block of size greater than one has the form of
(9.3). The determinant of a block diagonal matrix is obviously the
product of the determinants of the blocks, and we know from the last
paragraph how to compute those.

Thus we see that $\det T =
(\text{sign}\;\tau) \,a_1 \cdots a_n$. To put this into a form that
does not depend upon the particular permutation $\tau$, let
$\tau_{i,j}$ denote the entry in row row $i$, column $j$, of $T$ (so
$t_{i,j} = 0$ unless $i=\tau(j)$), and let $P(n)$ denote the set of
all permutations of $\{1, …, n\}$. Then

$$\det T = \sum_{\pi \in P(n)} (\text{sign}\,\pi) \;t_{\pi(1), 1},
…, t_{\pi(n), n}\;\;\;\;\;\;\;\;\;\;(9.4)$$

because each summand is $0$ except the one corresponding to the
permutation $\tau$. Consider now an arbitrary matrix $T$ with entries
$t_{i,j}$. Using the paragraph above as motivation, we guess that the
formula for $\det T$ is given by (9.4). The next proposition shows
that this guess is correct and gives the usual formula for the
determinant of a matrix.

Proposition 9.5

$$\det(T) = \sum_{\pi \in P(n)} (\text{sign}\,\pi) \;t_{\pi(1), 1},
…, t_{\pi(n), n}.$$

Now the part where I have problems is the following:

(1) Now let $\tau$ be a permutation of $\{1 ,…,n\}$, and consider a matrix $T$ whose $j^{th}$ column
consists of all zeroes except for $a_j$ in the $\tau(j)^{th}$ row. (2) The
permutation $\tau$ is a product of cyclic permutations. (3) Thus $T$ is
similar to (and so has the same determinant as) a block diagonal
matrix where each block of size greater than one has the form of
(9.3). (4) The determinant of a block diagonal matrix is obviously the
product of the determinants of the blocks, and we know from the last
paragraph how to compute those.

Here's my questions from this part:

  1. What form does $T$ have in this case? Does it have the same form as a permutation matrix except that the entries are $a_j$s?
  2. "The permutation $\tau$ is a product of cyclic permutations." What does this mean?
  3. I did not understand why $T$ is similar to block diagonal matrix? Why is it? Can the blocks be different in size? What does size mean in this context, the row-column dimension of the matrix?
  4. "The determinant of a block diagonal matrix is obviously the
    product of the determinants of the blocks"
    . This is not obvious to me. Why is this true?

Thank you for any help!

P.S. If you need any more information, please inform me. The relevant pages (necessary propositions, definitions etc.) in the reference paper above regarding my question are: 7, 8, 9, 14, 15.

Please note that I would appreciate a lot if I could get determinant-free answers (if you can), meaning that the answer for example to question (4) would not use any properties like these but rather to use e.g. eigenvalues since this is Sheldon's approach in the paper.

Best Answer

  1. Correct. It is a permutation matrix with entries $a_i$ instead of $1$.

2 and 3. This involves some concepts from group theory. I will state the basic ones and show the others using examples. A permutation is a bijective map from a set $A$ to itself. All permutations on $A$ form a group, denoted by $S_A$, with operation $\circ$, the composition of permutations. A nice property of $S_n$ (permutation group on $n$ elements) is that it can be written as a product of disjoint cycles. (To define cycle, we will have to define orbit and equivalence classes.) For our purpose, I will just use an example to illustrate this statement. Let $$\sigma=\begin{pmatrix} 1&2&3&4&5&6\\ 6&5&2&4&3&1\end{pmatrix}\in S_6.$$ This means the map $\sigma$ maps $1$ to $6$, $2$ to $5$, etc. We see a cycle $1\rightarrow 6\rightarrow 1$, and another cycle $2\rightarrow 5\rightarrow 3\rightarrow 2$. Notice that $4$ does not change under $\sigma$. So we write $\sigma=(1 6)(2 5 3)$. This is equivalent to $(6 1)(5 3 2)=(6 1)(3 2 5)$. Now let's consider the matrix $T$ corresponding to the permutation $\sigma$. $$T=\begin{pmatrix} 0&0&0&0&0&a_6\\ 0&0&a_3&0&0&0\\ 0&0&0&0&a_5&0\\ 0&0&0&a_4&0&0\\ 0&a_2&0&0&0&0\\ a_1&0&0&0&0&0 \end{pmatrix}$$ Remember that we want to group $a_1,a_6$, and $a_2,a_5, a_3$ together. Notice that moving $a_1, a_6$ to the first two rows and columns involve same permutations on the rows and the columns. This is equivalent as multiplying by some permutation $P$ on the left and $P^T$ on the right, which gives a similar matrix. Same for the other cycle $(2 5 3)$. Because of the cycle, $a_2$ is placed in row $5$, $a_5$ is placed in row $3$ and $a_3$ is placed in row $2$. So performing same permutation on rows and columns put them in the next diagonal block. Again we get similar matrix. The result is as follows: $$\left(\begin{array}{cc|ccc|c} a_1&0&0&0&0&0\\ 0&a_6&0&0&0&0\\ \hline 0&0&0&a_3&0&0\\ 0&0&0&0&a_5&0\\ 0&0&a_2&0&0&0\\ \hline 0&0&0&0&0&a_4 \end{array}\right)$$

Now you can see the first $2\times 2$ block, followed by a $3\times 3$ block. We need to rearrange the letters to make them into the form in (9.3). I will show this using the $3\times 3$ block. We can change $a_2, a_3,a_5$ to $b_1, b_2, b_3$ and we see that this $3\times 3$ block represents a permutation $\tau=(1 3 2)$. If you search for "conjugating a permutation", you will see that there exists a permutation $\delta$, such that $\delta^{-1}\tau\delta=(1 2 3)$, which gives you the form we want. And obviously this gives us a similar matrix. (Note $P_{\sigma\circ \tau}=P_{\sigma}P_{\tau}$.)

  1. Now let's suppose that $T$ is in the desired form, with $T_1, \dots, T_r$ the blocks. Let $x_1, \dots, x_r$ be any eigenvectors of $T_1, \dots, T_r$, with eigenvalues $\lambda_1, \dots, \lambda_r$, respectively. We see that $$T\cdot \begin{pmatrix}0\\ \vdots\\ 0\\ x_i\\ 0\\ \vdots\\ 0\end{pmatrix}=\lambda_i\cdot \begin{pmatrix}0\\ \vdots\\ 0\\ x_i\\ 0\\ \vdots\\ 0\end{pmatrix}$$ because of the block structure of $T$. This means the eigenvalues of $T$ consists exactly the eigenvalues of all $T_i$. Hence the determinant of $T$ is the product of the determinants of $T_i$.

I hope this helps.

Edit:

You can see that (9.3) is the matrix corresponding to the permutation $$\sigma=\begin{pmatrix} 1&2&3&4&5&6\\ 2&3&4&5&6&1\end{pmatrix}\in S_6.$$ Using the cycle notation, it is $(123456)$.

Edit: (To answer the question on $\text{sign}\pi$) The sign of a permutation is defined to be $-1$ to the power of the number of transpositions the permutation can be decomposed to. For example, $$(123456)=(16)(15)(14)(13)(12)$$ So the sign of $(123456)$ is $(-1)^5=-1$. $$(16)(253)=(16)(25)(23)$$ So the sign of $(16)(253)$ is $(-1)^3=-1$. Basically, a cycle $(12\cdots n)$ can be written as $(1n)(1, n-1)\cdots(12)$. So the sign of it is $(-1)^{n-1}$. You can see this in the determinant formula of (9.3). Notice that the sign of permutation does not change when you decompose it or change its form to disjoint cycles. This gives you the formula $$\det T=\text{sign}\tau a_1\cdots a_n$$ where $\tau$ is any permutation and $T$ is the corresponding matrix.

Now for the $\pi$, the author is just saying that for our specific $T$, there is only one permutation, namely $\tau$, that contributes to the summation. All the other terms are zero. He put it in this general form so that he can make a guess for the general case, which he does in Proposition 9.5. So $\text{sign}\pi$ is just the $\text{sign}\tau$ we discussed in the previous paragraph.