$\def\vect{\mathbf}
\def\diag{{\rm{diag}}}
\def\R{\mathbb R}
\def\vol{{\rm vol}}
\def\sign{{\rm sign}}
$
There are two ideas involved: One is that uniqueness of Lebesgue measure as a translation invariant Borel measure implies scaling invariance under linear transformations, and in particular absolute invariance under orthogonal transformations. The other is one or another multiplicative decomposition of a linear transformation. I will use the singular value decomposition.
Theorem. Lebesgue measure $\lambda$ on $\mathbb R^n$ is the unique Borel measure that is translation invariant and locally finite (i.e. measure of compacts sets is finite), up to scaling by a positive constant.
This is contained in Rudin, Real and Complex Analysis, 3rd edition, Theorem 2.20.
Corollary
If $T$ is any invertible linear transformation of $\mathbb R^n$, then there exists a positive constant $c_T$ such that for all Borel sets $E$, $\lambda(T(E)) = c_T\ \lambda(E)$.
IF $S$, $T$ are invertible linear transformations $c_{ST} = c_S c_T$
If $U$ is an orthogonal linear transformation, then $c_U = 1$.
Proof. For (1), note that $E \mapsto \lambda(T(E))$ is a translation invariant locally finite Borel measure. Part (2) is obvious. For part (3), it suffices to find a Borel set $S$ such that $0 < \lambda(S) < \infty$ and $U(S) = S$. But $S = \{x : ||x|| \le 1\}$ will do.
Thus Lebesgue measure is invariant under orthogonal transformations as well as under translations.
Lemma (Singular value decomposition) For any invertible matrix $A$, there exists two orthogonal matrices $W$, $V$ and a diagonal matrix $D = \diag(a_1, \dots, a_n)$, with $a_i >0$, such that $A = W D V$.
Remark: One can easily derive this from the polar decomposition and vice versa.
Corollary For any invertible $A$, $c_A = |\det(A)|$.
Proof. Write $A = W D V$, as in the Lemma, with $D = \diag(a_1, \dots, a_n)$.
Then $|\det(A)| = \prod_i a_i$. On the other hand,
$c_A = c_W c_D c_V = c_D$. Since $D$ applied the unit hypercube is a rectangular solid with edge lengths $a_1, \dots, a_n$, it follows that
$c_A = c_D = \prod a_i = |\det(A)|$.
Lemma. The Lebesgue measure of an affine hyperplane is zero.
Proof. It suffices to consider the coordinate hyperplane perpendicular to $\vect e_n$, using translation and orthogonal invariance. Moreover, it suffices to show that the measure of any bounded subset $K$ of this coordinate hyperplane is zero. But $K$ is contained in a rectangular solid of arbitrarily small measure.
Corollary. Let $v_1, \dots, v_n$ be given and let $P$ be the parallelepiped spanned by $v_1, \dots, v_n$ . Then $\lambda(P) =
|\det(v_1, \dots, v_n)|$. Moreover, the signed volume of $P$ is $\det(v_1, \dots, v_n)$
Proof. If the $v_i$ are linearly dependent then then $P$ has measure zero since $P$ is contained in a proper hyperplane, and the determinant is zero as well. Otherwise, let $A$ be the matrix $(v_1, \dots, v_n)$. Then $P$ is the the image of the unit hypercube under $A$, so
$\lambda(P) = c_A = |\det(A)|$. The last statement follows from the definition of signed volume, namely
$$
\vol(v_1, \dots, v_n) = \sign(\det(v_1, \dots, v_n)) \lambda(P) = \det(v_1, \dots, v_n).
$$
Remark: Occasionally one sees an explanation for the scaling of Lebesgue measure or for the formula for the Lebesgue measure of a parallelepiped which invokes the change of variable formula for integration. But these explanations are circular, as the conceptual basis for the change of variable formula is the local scaling of Lebesgue measure, which depends on the global scaling of Lebesgue measure under a linear transformation.
We can actually make further reductions. Suppose $T:\Bbb{R}^n\to\Bbb{R}^n$ is the linear transformation
\begin{align}
T(x_1,\dots, x_n)&=(x_1+x_2,x_2,\cdots, x_n)
\end{align}
In terms of matrices, we're taking the second row of the identity matrix and adding it to the first row. We only need to restrict attention to this particular one, because we can perform row swaps to ensure we're only looking at rows 1 and 2, and then perform a scalar multiplication to ensure we only deal with $c=1$. Now,
\begin{align}
T(Q)&=\{\xi\in\Bbb{R}^n\,:\, \xi_2\leq \xi_1\leq \xi_2+1\,\quad\text{and}\quad \xi_2,\dots, \xi_n\in [0,1]\}
\end{align}
(i.e just put $\xi_1=x_1+x_2$, and $\xi_j=x_j$ for $j\geq 2$, and manipulate the inequalities $x_i\in [0,1]$ for all $i$ in terms of $\xi$). Now, we have
\begin{align}
\text{vol}(T(Q))&=\int_{T(Q)}1\,dV\\
&=\int_{[0,1]^{n-2}}\int_0^1\int_{\xi_2}^{\xi_2+1}1\,d\xi_1\,d\xi_2\, d(\xi_3,\dots, \xi_n)\tag{by Fubini}\\
&=\int_0^1(\xi_2+1-\xi_2)\,d\xi_2\\
&=1.
\end{align}
Here, it's clear that the integral over the last $n-2$ coordinates is trivially $1$ (this is just the $(n-2)$-dimensional volume of the cube $[0,1]^{n-2}$).
Best Answer
This is essentially equivalent to showing that a linear operator represented by a matrix $A$ takes shapes to the domain to shapes in the range, and multiplies their volume by the determinant of $A$.
This is the way I convinced myself of this many years ago. First, it is obviously true for diagonal matrices. Next, it is also true for shear matrices, which are of the form identity matrix plus a matrix all of whose entries are zero except one entry. Finally, it is obviously true for permutation matrices. Now you merely need to prove that every matrix is a product of these kinds of matrices, and this can be done, for example, by going through the steps of the Gauss-Jordon method for solving systems of linear equations. And of course, you need to convince yourself that if a matrix $A$ multiplies volumes by a factor $a$, and another matrix multiplies volumes by a factor $b$, then $AB$ multiplies volumes by a factor $ab$. And you can see this by approximating any reasonable shape by lots of little parallelepipeds that fill in the shape.