Your argument is correct.
With a little effort (but not much), you should be able to modify your argument to make it more geometric.
Here is the the two dimensional version:
If we have a parallelogram spanned by the vectors $\vec{a}$, $\vec{b}$,
then the area of
the parallelogram is the same as the area of the rectangle spanned by $\vec{a}, \vec{b}',$
where $\vec{b}'$ is the component of $\vec{b}$ that is orthogonal to $\vec{a}$,
i.e. $$\vec{b}' = \vec{b} - \dfrac{(\vec{a} \cdot \vec{b})}{(\vec{a} \cdot \vec{a})} \vec{a}.$$
Now the process of going from the matrix $\Bigl( \, \vec{a} \quad \vec{b} \, \Bigr) $ to the matrix $\Bigl( \, \vec{a} \quad \vec{b}' \, \Bigr)$ involves
right-multiplying by the matrix $\begin{pmatrix} 1 & - (\vec{a} \cdot \vec{b})/
(\vec{a}\cdot\vec{a}) \\ 0 & 1\end{pmatrix}$, whose determinant is $1$.
Thus we see that $\Big( \, \vec{a} \quad \vec{b} \, \Bigr)$
and $\Bigl( \, \vec{a} \quad \vec{b}' \, \Bigr)$ have the same
determinant, and also describe parallelograms with the same area,
the latter being a rectangle.
This reduces you to checking the relationship between area and determinant in the case of a rectangle. Rotating this rectangle, you can make its edges parallel to the coordinate axes. Again, a rotation matrix has determinant one, so you are reduced
to checking the relationship between determinants and areas in the case
of a rectangle whose sides are parallel to the coordinate axes. this
is pretty obvious, and so we are done.
In the three dimensional case, you can argue similarly: you first of all reduce to the case where one face is a rectangle, and you then reduce to the case when the third side is perpendicular to the rectangular face, so that the whole thing
is a cuboid. These steps involve right multiplying by matrices
which are upper triangular with $1$'s down the diagonal, which thus have det $= 1$.
Now applying a bunch of rotations (again, each has det $= 1$), you can make
your cuboid have sides parallel to the coordinate axes, at which point the
formula is again pretty obvious.
Note that if we allow ourselves one more step --- namely, multiplying by a diagonal matrix (which, geometrically, is a rescaling of each of the coordinate axes) --- then we can start with any (non-degenerate) parallelepiped and convert it to
the standard cuboid with unit length sides sitting at the origin.
In linear algebra terms, this can be restated as the fact that any matrix
with non-zero determinant can be written as the product of a diagonal matrix,
some rotation matrices, and an upper triangular matrix with $1$'s down the diagonal.
Combining the diagonal and upper triangular matrix with $1$'s down the diagonal,
we obtain an upper triangular matrix with non-zero entries down the diagonal.
In other words, any matrix with non-zero determinant can be written as a product
of some rotation matrices with an upper triangular matrix.
This is usually called the $QR$ decomposition in linear algebra textbooks;
in more theoretical treatments it is called the Iwasawa decomposition.
So what I have just given is a geometric description of the $QR$ decomposition.
The difference between what I've described and your argument is that you use
elementary matrices, while I use just upper triangular matrices with $1$'s down
the diagonal. These arise from the geometric process of projecting one vector
to make it perpendicular to another, which is where the geometric perspective is
coming from.
Let's have a little lesson on determinants and why they correspond to volumes in the first place.
Exterior and Clifford algebra: algebras of planes, volumes, and other interesting subspaces
You might have realized that conventional linear algebra--the algebra of directions, as it were--is somewhat inadequate for talking about planes, volumes, and the like. These are interesting geometric objects, but they do not correspond to vectors in vector algebra. Conventional approaches devise some clever ways around this problem: they use normal vectors for planes in 3d, for instance, but this doesn't generalize.
An algebraic solution to this issue is to use something that builds up from vector algebra. Clifford algebra is a good choice: compared to vector algebra, the extra stuff you need is pretty minimal.
Clifford algebra follows from the introduction of a "geometric product" of vectors: if $a$ is a vector, then $aa \equiv |a|^2$. If $a, b$ are orthogonal, then $ab = -ba$. If $u, v, w$ are vectors, then $(uv)w = u(vw)$. So, we have behavior like the dot product, behavior like the cross product, and associativity.
The geometric product allows us to build up multivectors. A product of $k$ orthogonal vectors is called a $k$-blade. Vectors are 1-blades, but two orthogonal vectors $u,v$ would form a 2-blade $uv$.
Now, you might realize that it's clumsy to have to deal with only orthogonal or parallel vectors. Given two vectors $c, d$ that are neither parallel nor orthogonal, you can break down the geometric product like so:
$$cd = c_{\parallel} d + c_{\perp} d \equiv c \cdot d + c \wedge d$$
The dot and wedge used here are traditional. And they make sense for general blades. Let $B$ be some general $k$-blade. Then the expression
$$cB = c_\parallel B + c_\perp B = c \cdot B + c \wedge B$$
has a similar meaning: there are two well-defined parts of the product from looking at the projection of $c$ onto $B$ and the "rejection" of $c$ that is perpendicular to $B$. Note, however, that if $B$ is a $k$-dimensional subspace, a $k$-blade, then $c \cdot B$ is not a scalar. Rather, $c \cdot B$ is a $k-1$-dimensional subspace, and $c \wedge B$ is $k+1$ dimensional. Think about how this worked for the vector-vector case if that doesn't make sense.
Now, if you wanted to build up a parallelpiped from three vectors, you could wedge them together like so:
$$u \wedge v \wedge w = u_{\perp(v \wedge w)} v_{\perp(w)} w$$
The relationship with linear algebra: the geometric interpretation of determinant
The role of determinants here is somewhat misunderstood. Traditional linear algebra doesn't really explain it as well as clifford algebra can.
Given a linear operator $\underline T$, which we often represent with a matrix with respect to some basis*, there is a natural extension of that linear operator to blades. In paritcular, consider the definition
$$\underline T(u \wedge v \wedge w \wedge \ldots) \equiv \underline T(u) \wedge \underline T(v) \wedge \underline T(w) \wedge \ldots$$
Now, realize that in any $n$-dimensional space, the vector space of $n$-blades is like that of scalars: let $i$ denote one such $n$-blade. Then all other $n$-blades are scalar multiples of $i$. Think of a volume: all other volumes are scalar multiples of that volume in terms of numerical size. Negative volumes are just oriented differently (left-hand vs. right-hand rule).
Indeed, this gives a nice, geometric definition of the determinant that can be understood with the machinery of the clifford algebra:
$$\underline T(i) \equiv (\det \underline T) i$$
So the determinant becomes understood as an eigenvalue, the eigenvalue of any $n$-blade.
But this relationship has often been understood the wrong way around: because the determinant gives us a way to measure a given $n$-blade with respect to a reference $n$-blade (in this case, $i$), people often build up linear operators from the $n$ linearly independent vectors that they want to build up a (hyper-)parallelepiped from, and then they take a determinant to find the size of that parallelepiped. This is very backwards; the clifford algebra gives a direct way to do that with algebraic operations. Namely, if $I$ is the $n$-blade you want to consider and $i$ is the reference $n$-blade you want to use to measure its size against, just take
$$I i^{-1} \equiv I i/|i^2|$$
This is a nice, well-defined quantity.
Geometric interpretation of wedge products
So in fact, the relationship to determinants here is somewhat incidental. We really want to understand, geometrically, how to calculate the magnitude of an $n$-blade.
Again, the geoemtric product gives an obvious interpretation. Given a wedge product of $n$-vectors, we can rewrite that product as a geometric product using rejections. Again, remember this example:
$$u \wedge v \wedge w = u_{\perp(v \wedge w)} v_{\perp(w)} w$$
You could write that in terms of a unit 3-blade $i$ and some trig functions:
$$u \wedge v \wedge w = |u||v||w| i\sin \theta_{v,w} \sin \theta_{u,v\wedge w}$$
And these angles have immediate geometric significance: $\theta_{v,w}$ is the angle $v$ makes with $w$. $\theta_{u,v\wedge w}$ is the angle $u$ makes with the plane formed by $v \wedge w$. This can be done successively.
Geometric interpretation of Laplace expansion
But your question was specifically about the Laplace expansion. Well, let me consider an explicit calculation of one of these wedge products:
$$u \wedge v \wedge w = u^1 e_1 \wedge (v \wedge w) + u^2 e_2 \wedge (v \wedge w) + u^3 e_3 \wedge (v \wedge w)$$
This is basically what Laplace expansion does: in each $v \wedge w$ term, we neglect any terms that involve the vector we're weding on to it. So in the $e_1 \wedge (v \wedge w)$ term, we neglect anything in the parentheses that would have $e_1$ in it. In 3d, those are only terms that look like $e_2 e_3$, and so you get the familiar terms of $(v^2 w^3 - v^3 w^2) e_2 e_3$.
It should be understood, then, when Laplace expansion is invoked, we're explicitly doing the calculation of taking the vector we expanded along and, in essence, finding the rejection of that vector onto the $n-1$-dimensional subspace formed by the other vectors in the matrix (rejection = part perpendicular to that subspace). We're finding the "height" of the parallelepiped with respect to a certain base. And the whole process of finding a determinant here is just repeatedly finding heights, and then the corresponding areas or volumes that those heights help form.
Best Answer
One way to see Cramer's rule is that it simply makes use of a (very inefficient) way of calculating $A^{-1}$, specifically \begin{equation}A^{-1}=\frac{1}{\text{det}(A)}\text{Adj}(A),\end{equation} where Adj($A$) is known nowadays as the adjugate of $A$ (transpose of matrix of cofactors of $A$). If you substitute this equation into $X=A^{-1}B$ you basically get Cramer's rule, since for every component $x_i$ of $X$ you get the Laplace expansion by the ith column of $1/$det$(A)\cdot$det$(A_{C_i\leftrightarrow B})$ (you can check this...).
So then to interpret this geometrically it depends on how far you want to go. If you just see it as a way of calculating $A^{-1}B$ then it simply means that a vector $X$ which is transformed by $A$ to give the vector $B$, if this transformation is invertible then you can calculate $X$ by applying the inverse transformation to $B$. If you now want to go further you have to try and explain geometrically why \begin{equation}A^{-1}=\frac{1}{\text{det}(A)}\text{Adj}(A).\end{equation} So in terms of a geometrical interpretation, we know $A$ is a transformation, and det($A$) is the area of the parallelogram formed by the component vectors in $A$. So what can be interesting is to explore the role of the adjugate and how it determines the formulae for det$(A)$...if you want to...I'll leave this for you - in 2D it's simple, since the cofactors of each entry is just another entry (with a sign change where needed).