This is my proof without defining new notations.
Continuing from the induction hypothesis
$$\det{A}
=\sum_{j=1}^{n+1}(-1)^{1+j}[A]_{1,j}\det{A_{1,j}}
=\sum_{j=1}^{n+1}(-1)^{1+j}[A]_{1,j}\sum_{\sigma\in S_n}\text{sgn }\sigma\prod_{i=1}^{n}[A_{1,j}]_{i,\sigma(i)}$$
Denote $[n]=\{1, 2, ..., n\}$.
For any $\sigma\in S_n$,
since $\sigma$ is bijective,
let
$$i_1=\sigma^{-1}(1), i_2=\sigma^{-1}(2), ..., i_n=\sigma^{-1}(n).$$
Then $\{i_1, i_2, ..., i_n\}=[n]$
and $$\sigma(i_1)=1, \sigma(i_2)=2, ..., \sigma(i_n)=n.$$
As the following figure indicates.
$$\begin{matrix}
[n] & \sigma\in S_n & [n]\\
\hline
i_1 & \longrightarrow & 1 \\
i_2 & \longrightarrow & 2 \\
\vdots & \vdots & \vdots \\
i_n & \longrightarrow & n \\
\end{matrix}$$
Then
\begin{eqnarray*}
\det{A}
&=& \sum_{j=1}^{n+1}(-1)^{1+j}[A]_{1,j}\sum_{\sigma\in S_n}\text{sgn }\sigma\prod_{i=1}^{n}[A_{1,j}]_{i,\sigma(i)}\\
&=& \sum_{j=1}^{n+1}(-1)^{1+j}[A]_{1,j}\sum_{\sigma\in S_n}\text{sgn }\sigma\prod_{k=1}^{n}[A_{1,j}]_{i_k, \sigma(i_k)}\\
&=& \sum_{j=1}^{n+1}(-1)^{1+j}[A]_{1,j}\sum_{\sigma\in S_n}\text{sgn }\sigma\prod_{k=1}^{n}[A_{1,j}]_{i_k, k}\\
&=& \sum_{j=1}^{n+1}(-1)^{1+j}[A]_{1,j}\sum_{\sigma\in S_n}\text{sgn }\sigma\left(\prod_{k=1}^{j-1}[A_{1,j}]_{i_k, k}\prod_{k=j}^{n}[A_{1,j}]_{i_k, k}\right)\\
&=& \sum_{j=1}^{n+1}(-1)^{1+j}[A]_{1,j}\sum_{\sigma\in S_n}\text{sgn }\sigma\left(\prod_{k=1}^{j-1}[A]_{i_k+1, k}\prod_{k=j}^{n}[A]_{i_k+1, k+1}\right)\\
&=& \sum_{j=1}^{n+1}(-1)^{1+j}\sum_{\sigma\in S_n}\text{sgn }\sigma\left([A]_{1,j}\cdot \prod_{k=1}^{j-1}[A]_{i_k+1, k}\prod_{k=j}^{n}[A]_{i_k+1, k+1}\right)\\
&=& \sum_{j=1}^{n+1}(-1)^{1+j}\sum_{\sigma\in S_n}\text{sgn }\sigma \cdot [A]_{1,j}\cdot \underline{[A]_{i_1+1, 1}\cdot [A]_{i_2+1, 2}\cdots [A]_{i_{j-1}+1, j-1}}\cdot \\
&& \underline{[A]_{i_j+1, j+1}\cdot [A]_{i_{j+1}+1, j+2}\cdots [A]_{i_n+1, n+1}}\\
&=& \sum_{j=1}^{n+1}(-1)^{1+j}\sum_{\sigma\in S_n}\text{sgn }\sigma \cdot \underline{[A]_{i_1+1, 1}\cdot [A]_{i_2+1, 2}\cdots [A]_{i_{j-1}+1, j-1}}\cdot \\
&& [A]_{1,j}\cdot \underline{[A]_{i_j+1, j+1}\cdot [A]_{i_{j+1}+1, j+2}\cdots [A]_{i_n+1, n+1}}\\
\end{eqnarray*}
Consider a permutation $\tau_{\sigma}\in S_{n+1}$ as following
$$\begin{matrix}
[n+1] & \tau_{\sigma}\in S_{n+1} & [n+1]\\
\hline
i_1+1 & \longrightarrow & 1 \\
i_2+1 & \longrightarrow & 2 \\
\vdots & \vdots & \vdots \\
i_{j-1}+1 & \longrightarrow & j-1 \\
1 & \longrightarrow & j \\
i_j+1 & \longrightarrow & j+1 \\
\vdots & \vdots & \vdots \\
i_n+1 & \longrightarrow & n+1 \\
\end{matrix}$$
Then the equation above equals to
$$\det{A}=\sum_{j=1}^{n+1}(-1)^{1+j}\sum_{\sigma\in S_n}\text{sgn }\sigma\prod_{\ell=1}^{n+1}[A]_{\ell, \tau_{\sigma}(\ell)}.$$
Note that there is an one-to-one correspondence between $\sigma\in S_n$ and $\tau_{\sigma}\in S_{n+1}$ with $\tau_{\sigma}(1)=j$.
By the Lemma 2,
$\text{sgn }\tau_{\sigma}=(-1)^{1+j}\text{sgn }\sigma$.
Then
$$\det{A}=\sum_{j=1}^{n+1}\sum_{\substack{\tau\in S_{n+1}\\ \tau(1)=j}}\text{sgn }\tau\prod_{\ell=1}^{n+1}[A]_{\ell, \tau(\ell)}
=\sum_{\tau\in S_{n+1}}\text{sgn }\tau\prod_{\ell=1}^{n+1}[A]_{\ell, \tau(\ell)}.$$
Lemma 1.
If $\gamma\in S_{n+1}$ is
$$\begin{matrix}
[n+1] & \gamma\in S_{n+1} & [n+1]\\
\hline
1 & \longrightarrow & x_1 \\
2 & \longrightarrow & x_2 \\
\vdots & \vdots & \vdots \\
i & \longrightarrow & x_{i}\\
i+1 & \longrightarrow & x_{i+1}\\
\vdots & \vdots & \vdots \\
n+1 & \longrightarrow & x_{n+1}\\
\end{matrix}$$
Then $(x_i, x_{i+1})\gamma$ is
$$\begin{matrix}
[n+1] & \gamma\in S_{n+1} & [n+1]\\
\hline
1 & \longrightarrow & x_1 \\
2 & \longrightarrow & x_2 \\
\vdots & \vdots & \vdots \\
i & \longrightarrow & x_{i+1}\\
i+1 & \longrightarrow & x_{i}\\
\vdots & \vdots & \vdots \\
n & \longrightarrow & x_n \\
\end{matrix}$$
Lemma 2.
Back to our $\sigma$.
Consider $\sigma^{-1}\in S_n$.
We can define $\sigma^{-1}(n+1)=n+1$ to make it as an element in $S_{n+1}$.
That is,
$$\begin{matrix}
[n+1] & \sigma^{-1}\in S_{n+1} & [n+1]\\
\hline
1 & \longrightarrow & i_1 \\
2 & \longrightarrow & i_2 \\
\vdots & \vdots & \vdots \\
n & \longrightarrow & i_n \\
n+1 & \longrightarrow & n+1
\end{matrix}$$
By the Lemma 1, we can left-multiply a product of $m$ transpositions to make $i_1, i_2, ..., i_n, n+1$ in the right column in an increasing order.
In fact,
the product of these transpositions is $\sigma$.
Again, applying the Lemma 1 on $\tau_{\sigma}^{-1}\in S_{n+1}$ in the same way.
We can left-multiply $j-1$ transpositions to $\tau_{\sigma}^{-1}$ to move $1$ to the first element in the right column.
Then left-multiply $m$ transpositions to make $i_1+1, i_2+1, ..., i_n+1$ in the right column into an increasing order.
$$\begin{matrix}
[n+1] & \tau_{\sigma}^{-1}\in S_{n+1} & [n+1]\\
\hline
1 & \longrightarrow & i_1+1 \\
2 & \longrightarrow & i_2+1 \\
\vdots & \vdots & \vdots \\
j-1 & \longrightarrow & i_{j-1}+1 \\
j & \longrightarrow & 1 \\
j+1 & \longrightarrow & i_j+1 \\
\vdots & \vdots & \vdots \\
n+1 & \longrightarrow & i_n+1 \\
\end{matrix}$$
Suppose that
$s_m \cdots s_2 s_1 t_{j-1} \cdots t_2 t_1\tau_{\sigma}^{-1}=r_m\cdots r_2 r_1\sigma^{-1}=\varepsilon$,
where $s_m, ..., s_2, s_1, t_{j-1}, ..., t_2, t_1, r_m, ..., r_2, r_1$ all are transpositions
and $\varepsilon$ is the identity in $S_{n+1}$.
Therefore,
\begin{eqnarray*}
&&\text{sgn }(s_m \cdots s_2 s_1 t_{j-1} \cdots t_2 t_1\tau_{\sigma}^{-1})=\text{sgn }(r_m\cdots r_2 r_1\sigma^{-1})\\
&\Rightarrow& \text{sgn }(s_m \cdots s_2 s_1)\cdot \text{sgn }(t_{j-1} \cdots t_2 t_1)\cdot \text{sgn }(\tau_{\sigma}^{-1})=\text{sgn }(r_m\cdots r_2 r_1)\cdot \text{sgn }(\sigma^{-1})\\
&\Rightarrow& (-1)^{m}(-1)^{j-1}\text{sgn }(\tau_{\sigma}^{-1})=(-1)^{m}\text{sgn }(\sigma^{-1})\\
&\Rightarrow& (-1)^{j-1}\text{sgn }(\tau_{\sigma}^{-1})=\text{sgn }(\sigma^{-1})\\
&\Rightarrow& (-1)^{j-1}\text{sgn }(\tau_{\sigma})=\text{sgn }(\sigma)\\
&\Rightarrow& (-1)^{1+j}\text{sgn }(\tau_{\sigma})=\text{sgn }(\sigma)
\end{eqnarray*}
- Correct. It is a permutation matrix with entries $a_i$ instead of $1$.
2 and 3. This involves some concepts from group theory. I will state the basic ones and show the others using examples. A permutation is a bijective map from a set $A$ to itself. All permutations on $A$ form a group, denoted by $S_A$, with operation $\circ$, the composition of permutations. A nice property of $S_n$ (permutation group on $n$ elements) is that it can be written as a product of disjoint cycles. (To define cycle, we will have to define orbit and equivalence classes.) For our purpose, I will just use an example to illustrate this statement. Let
$$\sigma=\begin{pmatrix}
1&2&3&4&5&6\\
6&5&2&4&3&1\end{pmatrix}\in S_6.$$
This means the map $\sigma$ maps $1$ to $6$, $2$ to $5$, etc. We see a cycle $1\rightarrow 6\rightarrow 1$, and another cycle $2\rightarrow 5\rightarrow 3\rightarrow 2$. Notice that $4$ does not change under $\sigma$. So we write
$\sigma=(1 6)(2 5 3)$. This is equivalent to $(6 1)(5 3 2)=(6 1)(3 2 5)$. Now let's consider the matrix $T$ corresponding to the permutation $\sigma$.
$$T=\begin{pmatrix}
0&0&0&0&0&a_6\\
0&0&a_3&0&0&0\\
0&0&0&0&a_5&0\\
0&0&0&a_4&0&0\\
0&a_2&0&0&0&0\\
a_1&0&0&0&0&0
\end{pmatrix}$$
Remember that we want to group $a_1,a_6$, and $a_2,a_5, a_3$ together. Notice that moving $a_1, a_6$ to the first two rows and columns involve same permutations on the rows and the columns. This is equivalent as multiplying by some permutation $P$ on the left and $P^T$ on the right, which gives a similar matrix. Same for the other cycle $(2 5 3)$. Because of the cycle, $a_2$ is placed in row $5$, $a_5$ is placed in row $3$ and $a_3$ is placed in row $2$. So performing same permutation on rows and columns put them in the next diagonal block. Again we get similar matrix. The result is as follows:
$$\left(\begin{array}{cc|ccc|c}
a_1&0&0&0&0&0\\
0&a_6&0&0&0&0\\
\hline
0&0&0&a_3&0&0\\
0&0&0&0&a_5&0\\
0&0&a_2&0&0&0\\
\hline
0&0&0&0&0&a_4
\end{array}\right)$$
Now you can see the first $2\times 2$ block, followed by a $3\times 3$ block. We need to rearrange the letters to make them into the form in (9.3). I will show this using the $3\times 3$ block. We can change $a_2, a_3,a_5$ to $b_1, b_2, b_3$ and we see that this $3\times 3$ block represents a permutation $\tau=(1 3 2)$. If you search for "conjugating a permutation", you will see that there exists a permutation $\delta$, such that $\delta^{-1}\tau\delta=(1 2 3)$, which gives you the form we want. And obviously this gives us a similar matrix. (Note $P_{\sigma\circ \tau}=P_{\sigma}P_{\tau}$.)
- Now let's suppose that $T$ is in the desired form, with $T_1, \dots, T_r$ the blocks. Let $x_1, \dots, x_r$ be any eigenvectors of $T_1, \dots, T_r$, with eigenvalues $\lambda_1, \dots, \lambda_r$, respectively. We see that
$$T\cdot \begin{pmatrix}0\\
\vdots\\
0\\
x_i\\
0\\
\vdots\\
0\end{pmatrix}=\lambda_i\cdot \begin{pmatrix}0\\
\vdots\\
0\\
x_i\\
0\\
\vdots\\
0\end{pmatrix}$$
because of the block structure of $T$. This means the eigenvalues of $T$ consists exactly the eigenvalues of all $T_i$. Hence the determinant of $T$ is the product of the determinants of $T_i$.
I hope this helps.
Edit:
You can see that (9.3) is the matrix corresponding to the permutation
$$\sigma=\begin{pmatrix}
1&2&3&4&5&6\\
2&3&4&5&6&1\end{pmatrix}\in S_6.$$
Using the cycle notation, it is $(123456)$.
Edit: (To answer the question on $\text{sign}\pi$)
The sign of a permutation is defined to be $-1$ to the power of the number of transpositions the permutation can be decomposed to. For example,
$$(123456)=(16)(15)(14)(13)(12)$$
So the sign of $(123456)$ is $(-1)^5=-1$.
$$(16)(253)=(16)(25)(23)$$
So the sign of $(16)(253)$ is $(-1)^3=-1$. Basically, a cycle $(12\cdots n)$ can be written as $(1n)(1, n-1)\cdots(12)$. So the sign of it is $(-1)^{n-1}$. You can see this in the determinant formula of (9.3). Notice that the sign of permutation does not change when you decompose it or change its form to disjoint cycles. This gives you the formula
$$\det T=\text{sign}\tau a_1\cdots a_n$$
where $\tau$ is any permutation and $T$ is the corresponding matrix.
Now for the $\pi$, the author is just saying that for our specific $T$, there is only one permutation, namely $\tau$, that contributes to the summation. All the other terms are zero. He put it in this general form so that he can make a guess for the general case, which he does in Proposition 9.5. So $\text{sign}\pi$ is just the $\text{sign}\tau$ we discussed in the previous paragraph.
Best Answer
Yes, it is a convention based on the usual definition of a cross product in $3-D$ space that is referred to a right-handed ordered basis. This convention gives a positive sign to the volume of a cube whose sides are oriented as a right-handed basis.
Changing the convention simply change the sign of the determinant. And the determinant is the signed volume of the $n-$dimensional parallelepiped spanned by the column or row vectors of the matrix.