This identity can be computed directly from the definition of the determinant. It helps to regard $A$ as a matrix whose entries are differentiable functions with respect to the real parameter $t.$ Recall the definition of the determinant -
$\det A = \displaystyle\sum_{\sigma \in S_n} \textrm{ sgn } (\sigma ) \displaystyle\prod_{i=1}^n a_{i\sigma (i) }$
and that the cofactor matrix $A_{ij}$ is defined entrywise by the determinant of a the matrix formed by deleting row $i$ and column $j$; $\textrm{ Adj } A$ is the transpose of this matrix. Now, noting the product rule,
$\frac{d}{dt} \det (A(t) +tB) = \displaystyle\sum_{\sigma \in S_n } \textrm{sgn } (\sigma) \displaystyle\prod_{i=1}^n [a + tb]_{i\sigma (i)} = \displaystyle\sum_{\sigma \in S_n} \textrm{sgn }(\sigma ) \displaystyle\sum_{j=1}^n b_{j \sigma (j) } \displaystyle\prod_{j\ne i} [a + tb]_{i \sigma (i) }=
\displaystyle\sum_{j=1}^n \displaystyle\sum_{\sigma \in S_n} b_{j\sigma (j)} \displaystyle\prod_{j\ne i} [a + tb]_{i \sigma (i) }$
Letting $t=0$ gives
$\displaystyle\sum_{j=1}^n \displaystyle\sum_{\sigma \in S_n} \textrm{sgn } (\sigma ) b_{j \sigma (j)} \displaystyle\prod_{j\ne i} a_{i \sigma (i) } = $
$\displaystyle\sum_{j=1}^n \displaystyle\sum_{k=1}^n b_{jk} \displaystyle\sum_{\sigma (j) = k } \textrm{sgn }(\sigma )
\displaystyle\prod_{j\ne i} a_{i \sigma (i) } =
\displaystyle\sum_{j=1}^n \displaystyle\sum_{k=1}^n b_{jk} \det A_{jk} = \textrm{ tr }(\textrm{Adj }(A) B) $
What's sort of neat is that this identity can be used to prove Cramer's Rule. Note that as a special case, $\frac{d}{dt} (I+ tB)_{t=0} = \textrm{ Tr } (B)$ Hence
$\frac{d}{dt} \det (A +tB)_{t=0} =$
$\det (A) \frac{d}{dt} \det (I +tA^{-1}B)_{t=0} =$
$\det (A) \textrm{ Tr} (A^{-1} B)$
Note that the diagonal entries of $A^{-1} B$ are defined by $\displaystyle\sum_{k=1}^n a^{-1}_{ik} b_{ki}$ for some $1\le i \le n.$ If we take $B$ to be the matrix with $b_{ji}=1$ and zero entries elsewhere, we have $\textrm{ tr } A^{-1} B = a^{-1}_{ij}.$ Applying a similar remark to $\textrm{Adj }(A) B,$ we deduce that
$\det (A) A^{-1} = \textrm{ Adj } (A),$ as the matrices are equal entrywise.
Of course this theorem has a geometric interpretation! In a sense, it's a multidimensional analogue of «the volume of a parallelepiped is the product of the area of its base and its height».
3. Let's start with $3\times3$ case:
$$
\left|\begin{matrix}u_1&u_2&u_3\\v_1&v_2&v_3\\w_1&w_2&w_3\end{matrix}\right|=
u_1\left|\begin{matrix}v_2&v_3\\w_2&w_3\end{matrix}\right|
-u_2\left|\begin{matrix}v_1&v_3\\w_1&w_3\end{matrix}\right|
+u_3\left|\begin{matrix}v_1&v_2\\w_1&w_2\end{matrix}\right|.
$$
LHS is the volume of the parallelepiped spanned by three vectors, $u$, $v$ and $w$. What's the meaning of RHS? Clearly that's a scalar product of $u$ with something — namely, with the vector
$$
\left(\left|\begin{matrix}v_2&v_3\\w_2&w_3\end{matrix}\right|,
-\left|\begin{matrix}v_1&v_3\\w_1&w_3\end{matrix}\right|,\left|\begin{matrix}v_1&v_2\\w_1&w_2\end{matrix}\right|\right)=
\left|\begin{matrix}\overrightarrow{e_1}&\overrightarrow{e_2}&\overrightarrow{e_3}\\v_1&v_2&v_3\\w_1&w_2&w_3\end{matrix}\right|
$$ — i.e. with vector product of $v$ and $w$.
So the formula we get is $vol\langle u,v,w\rangle=(u,[v,w])$; now by the (geometrical) definition of scalar product it's $area\langle v,w\rangle\cdot (|u|\cdot\sin\phi)$, and the first factor is the area of the base and the second one is the height of our parallelepiped.
n. Consider the (general) case of vectors in $n$-dimensional space $V$. In RHS of the theorem we again see a scalar product of the first vector, $v$, with a vector $B$ (in coordinate-free language it really lives in $\Lambda^{n-1}V$, but let's ignore this for now) with coordinates $C_{1i}$.
The question is, what is the geometric meaning of $B$. Let me give 3 (closely related) answers.
- By the very same cofactor theorem it measures the [(n-1)-dimensional] area of projection of the base of our $n$-parallelepiped (i.e. $(n-1)$-parallelepiped spanned by all vectors but $v$) on different hyperplanes; more precisely, the area of the projection on the hyperplanes orthogonal to a unit vector $v$ is the scalar product $(B,v)$.
- Let's prove the cofactor theorem instead of using it. The function $(B,x)$ is linear in $x$. For a basis vector $x=e_i$ we have $(B,x)=C_{1i}$, which (up to sign, at least) is the area of the span of projections of our vectors on the hyperplane orthogonal to $e_i$. So $(B,x)$ is indeed the area of the projection of the base on the hyperplane orthogonal to $x$ (multiplied by $|x|$ and taken with appropriate signs).
- Even better, since everything is invariant under (special) orthogonal transforms, let's change basis to make $v$ a scalar multiple of $e_1$. Now the statement «$(B,v)$ is the $|v|$ times the area of the projection» became obvious (we literally multiply $|v|$ by the cofactor manifestly equal to this area — well, it was discussed in (2) anyway).
Now I must admit the statement we get is more like «the volume of a parallelepiped $\langle u,\text{base}\rangle$ is the product of the length of $u$ and the area of the projection of its base on the hyperplane orthogonal to $u$» — but it's of course equivalent to «the volume of a parallelepiped is the product of the area of its base and its height».
Best Answer
Not every permutation matrix has determinant $-1$, but the elementary matrices which are permutation matrices (corresponding to interchanges of two rows) have determinant $-1$. The easy way to see this is that (1) the identity matrix has determinant $1$, and (2) interchanging two rows or columns of a matrix multiplies its determinant by $-1$.