The book I'm working through (Calculus and Analysis in Euclidean Spaces by Shurman et al.) is introducing determinant in a characterizing and descriptive way, rather than just handing out it's algorithm. Therefore as I have to show that for a square echelon matrix $E$ it holds that $\det{E} = \det{E^T}$, I'm guessing that the proof "excepts" the usage of the determinant as a function of the rows of the input matrix. So without further ado: If $E = I$, then $\det{E} = \det{I} = 1$, as the determinant is normalized. But what about if $E \neq I$? Then necessarily at least the last row of $E$ is full of zeroes, and thus $\det{(E)} = \det{(r_1,\dots,r_{n-1},0)} = 0\cdots c \det{(e_1,\dots,e_n)} = 0 \cdot c \cdot 1 = 0, c \in \mathbb{R}$.
But how do you argue about the transpose? So far the "only" tools that have been given are the elementary matrices of $S_{i, a}, T_{i;j}, R_{i;j,a}$ which scales the ith row by $a$, swap the $i$th and $j$th row and add to the $i$th row the $j$th row scaled by $a \in \mathbb{R}$. It can be assumed that the determinants of these matrices are equal to their transposes, and $\det{S_{i, a}} = a, \det{T_{i;j}} = -1, \det{R_{i;j,a}} = 1$.
Best Answer
Based on what you've said, you know that the determinant is alternating, multilinear, and $\det(I) = 1$ (It can be shown that these three properties uniquely characterize the determinant). Thus we can expand out the determinant using the multilinearity in the straightforward way: \begin{align} \det(A) &= \det(a_1, \dots, a_n) \\ &= \sum_{i_1 = 1}^{n}a_{i_1, 1}\det(e_{i_1}, a_2, \dots, a_n) \\ &= \sum_{i_1 = 1}^{n}\sum_{i_2 = 1}^{n}a_{i_1, 1}a_{i_2, 2}\det(e_{i_1}, e_{i_2}, a_3, \dots, a_n) \\ &= \dots \\ &= \sum_{i_1, \dots, i_n = 1}^{n}a_{i_1, 1}\dots a_{i_n, n}\det(e_{i_1}, \dots, e_{i_n}). \end{align} By the alternating property, if two columns of a matrix are equal, then it's determinant is $0$. Thus we only need to sum over the permutations of $\{1, \dots, n\}$: \begin{align} \det(A) &= \sum_{i_1, \dots, i_n = 1}^{n}a_{i_1, 1}\dots a_{i_n, n}\det(e_{i_1}, \dots, e_{i_n}) \\ &= \sum_{\sigma \in S_n}a_{\sigma(1), 1}\dots a_{\sigma(n), n}\det(e_{\sigma(1)}, \dots, e_{\sigma(n)}). \end{align} By the alternating property and the property $\det(I) = 1$, we have $\det(e_{\sigma(1)}, \dots, e_{\sigma(n)}) = (-1)^{\sigma}\det(e_1, \dots, e_n) = (-1)^\sigma$ for any $\sigma \in S_n$, where $(-1)^{\sigma}$ denotes the sign of $\sigma$ (note $(-1)^{\sigma} = (-1)^{\sigma^{-1}}$). Thus \begin{align} \det(A) &= \sum_{\sigma \in S_n}a_{\sigma(1), 1}\dots a_{\sigma(n), n}(-1)^{\sigma}. \\ \end{align} With this general formula in hand, proving $\det(A) = \det(A^T)$ is easy. We have \begin{align} \det(A) &= \sum_{\sigma \in S_n}a_{\sigma(1), 1}\dots a_{\sigma(n), n}(-1)^{\sigma} \\ &= \sum_{\sigma \in S_n}a_{1, \sigma^{-1}(1)}\dots a_{n, \sigma^{-1}(n)}(-1)^{\sigma} \\ &= \sum_{\sigma \in S_n}a_{1, \sigma^{-1}(1)}\dots a_{n, \sigma^{-1}(n)}(-1)^{\sigma^{-1}} \\ &= \sum_{\tau \in S_n}a_{1, \tau(1)}\dots a_{n, \tau(n)}(-1)^{\tau} \\ &= \det(A^T). \end{align}