[Math] Why does computation of the determinant have anything to do with the “volume” of the fundamental “parallelepiped”

determinantlinear algebramatricesvolume

By the "volume" of "parallelepiped", I mean the Lebesgue measure of n-Parallelotope.

If I have $$\vec{v_i}=\begin{bmatrix}a_{1i} \\ a_{2i} \\ \vdots \\ a_{ni}\end{bmatrix} \qquad \text{ for } i\in\{1,2,3\ldots,n\}$$
and $$\mathbf A=\begin{bmatrix}\vec{v_1} & \vec{v_2} & \cdots & \vec{v_n}\end{bmatrix}=\begin{bmatrix}a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn}\end{bmatrix}$$
Now there are two ways to define the determinant.

Definition 1: If $\mathbf C_{ij}$ is the cofactor, then $$\det(\mathbf A)=\sum_{k=1}^{n}a_{ik}C_{ik}=\sum_{k=1}^{n}a_{kj}C_{kj} \qquad \text{for any } i,j\in\{1,2,3,\ldots,n\}$$
Definition 2: $\det(\mathbf A)$ is the Lebesgue measure of the fundamental n-Parallelotope spanned by the the column vectors $\vec{v_i}\in\mathbb R^n$.

How do I prove that the two definitions are equivalent?
I personally like definition 2 because, I can visualize it but in definition 1, first we need to show that the summations are giving same values for all i and j.

I can use the second definition definition for n=2, first thing I noted was that column operations does not change the area of the parallelogram because of simple geometry properties. Thus, $$\begin{vmatrix}a & c\\ b & d\end{vmatrix}=\begin{vmatrix}a & c-a\frac{c}{a}\\ b & d-b\frac{c}{a}\end{vmatrix}=\begin{vmatrix}a & 0\\ b & \frac{ad-bc}{a}\end{vmatrix}=\begin{vmatrix}a-0\frac{ab}{ad-bc} & 0\\ b-\frac{ad-bc}{a}\frac{ab}{ad-bc} & \frac{ad-bc}{a}\end{vmatrix}=\begin{vmatrix}a & 0\\ 0 & \frac{ad-bc}{a}\end{vmatrix}$$ This turns it into a rectangle whose area can be calculated easily.
$$\begin{vmatrix}a & c\\ b & d\end{vmatrix}=ad-bc$$
But this is the same as we get from definition 1. Thus both definitions are equivalent for n=2.

The argument in finding determinant for n=2 by the second definition generalizes easily, but the method of computation feels completely different as compared to definition 1. Like for n=3, I got $$\begin{vmatrix}a&d&g\\b&e&h\\c&f&i\end{vmatrix}=\begin{vmatrix}\frac{a(ei-hf)-d(bi-ch)-g(ec-bf)}{ei-hf}&0&0\\0&\frac{ei-hf}{i}&0\\0&0&i\end{vmatrix}=a(ei-hf)-d(bi-ch)+g(bf-ec)$$ I can see a little bit connection for $i=1$ in definition 1.

I got to know that definition 1 is called Laplace Expansion, but the proof written on Wikipedia went above my tiny brain. I am in 11th grade and I know very little about Linear algebra(I know only the stuff Grant Sanderson told in his essence of LA playlist). After reading the answer Determinant of transpose I can make sense why row operations do not change the determinant as well. I would be really happy if someone proves definition 1 using definition 2.

Best Answer

$\def\vect{\mathbf} \DeclareMathOperator{\Mat}{\rm{Mat}} \newcommand{\vol}{{\rm{vol}}} \newcommand\sign{{\rm{sign}}} $

Since you say you haven't yet studied linear algebra, you may have to take this explanation as a study program for the (near) future rather than something you can take in at one go.

Step 1. Forget both of your definitions of determinant. The determinant is something intrinsically associated with matrices (or with linear transformations) and is determined by certain properties. We regard a determinant alternatively as a function of matrices or as a function of a sequence of $n$ column vectors in $\mathbb R^n$. We go back and forth between $n$-by-$n$ matrices and sequences of column vectors in the obvious way. The defining properties:

(1) $\det(v_1, v_2, \dots, v_n)$ is a multilinear function of the the $n$ vector variables. That is, it is linear in each vector variable separately.

(2) $\det(v_1, v_2, \dots, v_n)$ is alternating; that means if you switch any two vector variables, the result changes by a factor of $-1$

(3) Normalization: $\det(E) = 1$, where $E$ is the identity matrix.

The last property says for the standard unit vectors $\vect{e}_1, \dots, \vect{e}_n$, $\det(\vect{e}_1, \dots, \vect{e}_n) = 1$.

$\DeclareMathOperator{\Mat}{\rm{Mat}}$

Theorem: There is a unique function $\det$ on $\Mat_n(\mathbb R)$ satisfying the three properties listed above. Moreover \begin{equation} \det(A) = \sum_{\sigma \in S_n} \epsilon(\sigma) a_{1, \sigma(1)} a_{2, \sigma(2) }\cdots a_{n, \sigma(n)}. \tag{S} \end{equation}

In the theorem statement, the sum is over the symmetric group $S_n$.

Now using only the three defining properties, one can show the following. Let $A$ be an $n$-by-$n$ matrix, and define $A_{i, j}$ to be the matrix obtained by striking out the $i$--th row and $j$--th column of $A$, so $A_{i, j}$ is $(n-1)$-by-$(n-1)$. Let $\mathcal C(A)$ be the matrix whose $(i, j)$ entry is $(-1)^{i+j} \det(A_{i, j})$. Then \begin{equation} A \mathcal C(A)^t = \mathcal C(A)^t A = \det(A) E. \tag{L} \end{equation} This statement encompasses all the Laplace expansions of $\det(A)$ and some orthogonality relations as well. So don't take the Laplace expansion as a definition but as a consequence of the intrinsic definition.

Some other will known properties of determinants that will be used are:

  • $\det(AB) = \det(A) \det(B)$

  • $\det(A^t) = \det(A)$

These can be derived using the defining properties or the summation formula (S).

Step 2. We need a working definition of volume and signed volume. We want to define for $1 \le r \le n$ the (non-negative) $r$ dimensional volume of the parallelepiped spanned by $r$ vectors $v_1, \dots, v_r$ in $\mathbb R^n$, denoted $|\vol_r|(v_1, \dots, v_r)$. We start with $|\vol_1|(v_1) = ||v_1||$. Supposing that $r \ge 2$ and that $r-1$ dimensional volume has been defined, we do the following. If $v_1, \dots, v_r$ are linearly dependent, define $|\vol_r|(v_1, \dots, v_r) = 0$. Otherwise, apply the Gram-Schmidt procedure to the sequence of vectors $v_1, \dots, v_r$ to get an orthonormal basis $\vect f_1, \dots, \vect f_r$ of the subspace $M$ spanned by $v_1, \dots, v_r$. Note that the dot product $(v_r, \vect f_r)$ is positive. In fact, it is the length of the projection of $v_r$ onto the orthogonal complement in $M$ of the span of $v_1, \dots, v_{r-1}$ (exercise). Define $$ |\vol_r|(v_1, \dots, v_r) = (v_r, \vect f_r) \cdot |\vol_{r-1}|(v_1, \dots, v_{r-1}) . $$ We write $|\vol|$ for $|\vol_n|$. This is the procedure we know from elementary mathematics: we take the $(r-1)$ dimensional volume of the base and multiply it by the one dimensional altitude of the parallelepiped.

It is not manifest that the result is independent of the order in which the vectors $v_1, \dots, v_r$ are listed. This will emerge later.

Finally we can define the $n$ dimensional signed volume $\vol(v_1, \dots, v_n)$ for a sequence of $n$ vectors as follows. If the vectors are linearly dependent, the answer is $0$. Otherwise,
take $$ \vol(v_1, \dots, v_n) = \sign(\det(v_1, \dots, v_n)) |\vol|(v_1, \dots, v_n). $$

This is related to a notion of orientation. A ordered basis of $\mathbb R^n$ is said to be positively oriented if $\det(v_1, \dots, v_n)$ is positive and negatively oriented otherwise. So the signed volume is positive if the basis is positively oriented and negative if the basis is negatively oriented. In dimension $3$ orientation can be described in terms of a familiar right hand rule.

Step 3. Let's discuss a little about orthogonal matrices. A matrix $U$ is orthogonal if $U^t U = U U^t = E$. This is so if and only if $U$ preserves all dot products $(U u, Uv) = (u, v)$ for all $u, v$. Also if and only if the columns of $U$ form an orthonormal basis of $\mathbb R^n$. An orthogonal matrix has determinant equal to $\pm 1$ because $$\det(U)^2 = \det(U^t) \det (U) = \det(U^t U) = \det(E) = 1.$$

Orthogonal matrices with determinant 1 are called special orthogonal matrices.

Observation: If $U$ is an orthogonal matrix, then for any $r \le n$ and any $v_1, \dots, v_r$, $$ |\vol_r|(Uv_1, \dots, Uv_r) = |\vol_r|(v_1, \dots, v_r). $$ Moreover, for any $v_1, \dots, v_n$, $$ \vol(Uv_1, \dots, Uv_n) = \det(U) \vol(v_1, \dots, v_n). $$ In particular if $U$ is special orthogonal, then $$ \vol(Uv_1, \dots, Uv_n) = \vol(v_1, \dots, v_n). $$

Proof: follows from the definitions because orthogonal matrices preserve inner products, and because $$\det(U v_1, \dots, U v_n) = \det (U) \det(v_1, \dots, v_n).$$

Step 4. Let's take some sequence of vectors $(b_1, \dots, b_n)$ in special position and compare $\det(b_1, \dots, b_n)$ and $\vol(b_1, \dots, b_n)$. The special assumption is $b_1, \dots, b_{n-1}$ have zero last coordinate, so lie in the span of $\vect e_1, \dots, \vect e_{n-1}$, or equivalently are perpendicular to $\vect{e}_n$. Let $B = (b_1, \dots, b_n)$. The last row of $b$ has non-zero $(n, n)$ entry, $b_{n, n} $ and the rest of the entries are zero. $B$ is thus block triangular: $$ B = \begin{bmatrix} B_{n, n} & * \\ 0 & b_{n, n} \end{bmatrix}. $$
where $B_{n,n}$ is $(n-1)$--by--$(n-1)$, $*$ indicates a $(n-1)$--by--$1$ column, and $0$ a $1$--by--$(n-1)$ row of zeros.

In this special situation,
$$ \det B = \det(b_1, \dots, b_n) = b_{n,n} \det (B_{n,n}), $$ and this is the Laplace expansion according to the last column.

Now consider the computation of volume and signed volume. We have $$ |\vol_{n-1}|(b_1, \dots, b_{n-1}) = |\vol_{n-1}|(B_{n, n}), $$ as follows from the definitions. Since the span of $b_1, \dots, b_{n-1}$ is the same span of $\vect e_1, \dots, \vect e_{n-1}$, the final Gram-Schmidt vector $\vect f_n$ entering into the computation of $|\vol_n|(b_1, \dots, b_n)$ is necessarily $(\pm 1) \vect e_n$, and $(b_n, \vect f_n)$ is $|b_{n,n}|$. Thus $$ |\vol_n|(b_1, \dots, b_n) = |b_{n, n}| \, |\vol_{n-1}|(B_{n, n}). $$ By induction on dimension, we may assume $$ \det(B_{n, n}) = \vol_{n-1} (B_{n, n} ) \quad \text{and therefore} \quad |\det(B_{n, n})| = |\vol_{n-1} |(B_{n, n} ). $$ Substituting, $$ |\vol_n|(b_1, \dots, b_n) = |b_{n, n}| \, |\vol_{n-1}|(B_{n, n}) = |b_{n, n}| \, |\det(B_{n, n})| = |\det(B)|. $$ But then, \begin{align*} \vol(b_1, \dots, b_n) &= \sign(\det(b_1, \dots, b_n)) |\vol_n|(b_1, \dots, b_n) \\ &= \sign(\det(b_1, \dots, b_n)) |\det(b_1, \dots, b_n)| \\ &= \det(b_1, \dots, b_n). \end{align*}

Step 5. Let's reduce the general problem to the special case just considered. Lets start with $A = (a_1, \dots, a_n)$. Take any special orthogonal matrix $U$. Then we already know that $$ \det U A = \det(U a_1, \dots, U a_n) = \det U \det A = \det A, $$ and $$ \text{vol}( U A) = \text{vol}(U a_1, \dots, U a_n) = \text{vol}(a_1, \dots, a_n) = \text{vol}(A). $$ Now it is always possible to find a special orthogonal matrix such that $(U a_1, \dots, U a_{n-1})$ is a system of vectors in the coordinate hyperplane plane orthogonal to $\vect{e}_n$, and for such a choice, $\text{vol}( U A) = \det (UA)$ by Step 4. Thus for our original $A = (a_1, \dots, a_n)$, $$ \text{vol}(A) = \text{vol}(UA) = \det(UA) = \det A. $$

Step 6. So far we have shown that $\text{vol}(A) = \det A$ for any $A = (v_1, \dots, v_n)$, but we haven't emphasized the Laplace expansions. You particularly wanted the Laplace expansions to be naturally interpreted as signed volumes. In some sense, there is nothing to show, because we know that the determinant is calculated by Laplace expansions and is on the other hand equal to volume. But we do get for free a geometric interpretation of the Laplace expansion. Let's take a particular expansion, that along the first column: $$ \text{vol}(A) = \det(A) = \sum_i a_{i, 1} (-1)^i \det(A_{i, 1}) = (v_1, C), $$ where $C$ is the vector whose $i$-th coordinate is $ (-1)^i \det(A_{i, 1})$. If we keep $v_2, \dots, v_n$ and replace $v_1$ by any vector $w$ in the hyperplane spanned by $v_2, \dots, v_n$, we get $$ 0 = \det ( w, v_2, \dots, v_n) = (w, C). $$ which means that the vector $C$ is perpendicular to the hyperplane spanned by $v_2, \dots, v_n$. If we keep $v_2, \dots, v_n$ and replace $v_1$ by $C$, we get $$ \det ( C, v_2, \dots, v_n) = (C, C) = ||C||^2 > 0, $$ which means that $(C, v_2, \dots, v_n)$ is positively oriented. Now if we replace $v_1$ by the unit vector $u = C/||C||$, we get $$ \det ( u, v_2, \dots, v_n) = (u, C) = ||C||. $$ But since $u$ is perpendicular to the hyperplane spanned by $( v_2, \dots, v_n)$ and of length $1$, and $(u, v_2, \dots, v_n)$ is positively oriented, we have $$ \text{vol}(u, v_2, \dots, v_n) = \text{vol}_{n-1}(v_2, \dots, v_n) $$ the $n-1$ dimensional volume of $(v_2, \dots, v_n) $. Thus the length of $C$ is the $n-1$ dimensional volume of $(v_2, \dots, v_n) $.

To summarize, in the Laplace expansion along the first column, the vector $C$ appearing in the expansion $\det (v_1, \dots, v_n) = (v_1, C)$ is perpendicular to the hyperplane spanned by $(v_2, \dots, v_n) $, of length equal to the $n-1$ dimensional volume of $(v_2, \dots, v_n) $, and determines a positively oriented system of vectors $(C, v_2, \dots, v_n)$, and the $n$ dimensional volume $\text{vol}(v_1, \dots, v_n)$ is given by $\text{vol}(v_1, \dots, v_n) = (v_1, C)$.

Related Question