The general proof is not difficult.
From the definition of a determinant (sum of products), the expansion must be a polynomial in $x_1,x_2,\cdots x_n$, of degree $0+1+2+\cdots n-1=\dfrac{(n-1)n}2$, and the coefficient of every term is $\pm1$.
On another hand, the determinant cancels whenever $x_j=x_k$, so that the polynomial must be a multiple of
$$(x_1-x_2)(x_1-x_3)(x_1-x_4)\cdots(x_1-x_n)\\
(x_2-x_3)(x_2-x_4)\cdots(x_2-x_n)\\
(x_3-x_4)\cdots(x_3-x_n)\\
\cdots\\
(x_n-x_{n-1})$$ ($\dfrac{(n-1)n}2$ factors).
Hence the determinant has no other choice than being $\pm$ this product.
For the $3\times3$ case,
$$\begin{vmatrix} 1 & x_1 & x_1^2\\ 1 &x_2 & x_2^2\\ 1 & x_3 &
x_3^2 \end{vmatrix}=
\begin{vmatrix} 1 & x_1 & x_1^2\\ 0 &x_2-x_1 & x_2^2-x_1^2\\ 0 & x_3-x_1 &
x_3^2-x_1^2 \end{vmatrix}=\begin{vmatrix} x_2-x_1 & x_2^2-x_1^2\\ x_3-x_1 &
x_3^2-x_1^2 \end{vmatrix}=(x_2-x_1)(x_3-x_1)\begin{vmatrix} 1&x_2+x_1 \\1& x_3+x_1 \end{vmatrix}=(x_2-x_1)(x_3-x_1)(x_3-x_2).$$
I'll use the Vandermonde determinant in the form
$$
V(x) = \begin{vmatrix}
1 & 1 & \ldots & 1 \\
x_1 & x_1 & \ldots & x_n \\
\vdots & \vdots & \ldots & \vdots \\
x_1^{n-2} & x_2^{n-2} & \ldots & x_n^{n-2} \\
x_1^{n-1} & x_2^{n-1} & \ldots & x_n^{n-1}
\end{vmatrix}.
$$
where $x = (x_1, \ldots, x_n)$. This differs from your definition at most by a factor $(-1)$.
It is well known that
$$
V(x) = \prod_{1 \le i < j \le n} (x_j - x_i) \, .
$$
The first part of the following computation is essentially what Shannon Starr wrote on MathOverflow, and also what heropup wrote in their answer.
Using the logarithmic derivative we get
$$
\frac{\partial}{\partial x_k} V(x) =
\sum_{\substack{i=1\\ i\neq k}}^n \frac{ V(x)}{x_k - x_i} \, .
$$
It follows that
$$
\frac{\partial^2}{\partial x_k^2} V(x) =
\sum_{\substack{i=1\\ i\neq k}}^n \left( \sum_{\substack{j=1\\ i\neq k}}^n \frac{ V(x)}{(x_k - x_i)(x_k - x_j)}
- \frac{ V(x)}{(x_k - x_i)^2}\right)
= \sum_{\substack{i,j=1\\ i\neq k, j \ne k, i \ne j}}^n \frac{ V(x)}{(x_k - x_i)(x_k - x_j)}
$$
and therefore
$$
D V(x) = V(x) \sum_{\substack{i,j, k=1\\ i\neq k, j \ne k, i \ne j}}^n \frac{ 1}{(x_k - x_i)(x_k - x_j)} \, .
$$
This identity holds for all $x = (x_1, \ldots, x_n)$ with pairwise distinct components $x_i$.
Now the second part: We show that
$$
S(x) = \sum_{i, j, k}^n {\vphantom{\sum}}' \frac{ 1}{(x_k - x_i)(x_k - x_j)}
$$
is zero for all $x = (x_1, \ldots, x_n)$ with pairwise distinct components $x_i$. Here I have used $\sum_{i, j, k}^n {\vphantom{\sum}}'$ as an abbreviation for the sum over all pairwise distinct $(i, j, k)$.
With a partial fraction decomposition one gets
$$
S(x) = \sum_{i, j, k}^n {\vphantom{\sum}}' \frac{1}{x_i-x_j}\left( \frac{ 1}{x_k - x_i}- \frac{ 1}{x_k - x_j}\right) \\
= -\sum_{i, j, k}^n {\vphantom{\sum}}' \frac{ 1}{(x_i - x_j)(x_i - x_k)}
-\sum_{i, j, k}^n {\vphantom{\sum}}' \frac{ 1}{(x_j - x_i)(x_j - x_k)} \, .
$$
The two sums on the right are both equal to $S(x)$ (only the indexing is different), i.e.
$$
S(x) = -2 S(x) \implies S(x) = 0 \, .
$$
It follows that $DV(x) = 0$ (first for distinct $x_i$, and then for arbitrary arguments because of continuity). This concludes the proof.
Best Answer
1. It's easier to follow on a simple case, for example a $3 \times 3$ determinant with the known expansion:
$$ \begin{vmatrix} a_{3} &a_{2} &a_{1} \\ b_{3} &b_{2} &b_{1} \\ c_{3} &c_{2} &c_{1} \end{vmatrix} \;\;=\;\; -a_1 b_2 c_3 + a_1 b_3 c_2 + a_2 b_1 c_3 - a_2 b_3 c_1 - a_3 b_1 c_2 + a_3 b_2 c_1 \tag{1} $$
Making the substitutions $a_3 \to a^2, a_2 \to a^1, a_1 \to a^0 \dots\,$ and using the Vandermonde expression:
$$ \begin{vmatrix} a^{2} &a &1 \\ b^{2} &b &1 \\ c^{2} &c &1 \end{vmatrix} \;\;=\;\; (a-b)(a-c)(b-c) \tag{2} $$
Expanding the product in $(2)$:
$$ \begin{align} (a-b)(a-c)(b-c) \;&=\; a^2 b - a^2 c - a b^2 + a c^2 + b^2 c - b c^2 \\ \;&=\; a^2 b^1c^0 - a^2 b^0 c^1 - a^1 b^2 c^0 + a^1 b^0 c^2 + a^0 b^2 c^1 - a^0 b^1 c^2 \tag{3} \end{align} $$
Finally, making the reverse substitutions $a^2 \to a_3, a^1 \to a_2, a^0 \to a_1$ in $(3)$ turns out to give the same expression as the right hand side of $(1)$.
The "rule" can be formally proved to work in the general case, though calling it a possible definition of the determinant is a bit of a stretch.
2. The substitution can be justified in that very particular context. Without pretense of a formal proof, but the idea is that the substitutions $x_{ik} \to x_i^{k-1}$ are reversible in that case, because each term of the determinant is a monomial containing exactly one element from each line and each column, so each term in $(3)$ can be univocally traced back to the elements in the matrix whose product was taken. For example, the first term $a^2b$ can only be the product of the $a^2$ in the top left corner, the $b$ in the middle, and the $1$ at the intersection of the remaining row and column, which is the bottom right corner. Because of the reversibility of the substitution, it is possible to translate the problem "purely formally" into the other one with a known result, then transfer that result back to the original problem.
Using such "trick" substitutions must be done very carefully, lest they produce bogus results, or just plain fail. Suppose for example that you tried to mechanically apply the same argument to calculate the following:
$$ \begin{vmatrix} a_2 &a_1 \\ a_1 &a_2 \end{vmatrix} \;\;\to\;\; \begin{vmatrix} a^1 &a^0 \\ a^0 &a^1 \end{vmatrix} \;\;=\;\; \begin{vmatrix} a &1 \\ 1 &a \end{vmatrix} \;\;\to\;\; a^2 - 1 \;\;\to\;\; \text{???} $$
[ EDIT ] What the quoted statement does is essentially the following, where the blue arrows denote the substitutions from and back to the $\left(x_{ij}\right)$ space, and the red arrows indicate algebraic manipulations in the other $\left(x_i^{j-1}\right)$ space.
$$\displaystyle \left|x_{ij}\right| \color{blue}{\to} \left|x_i^{j-1}\right| \color{red}{\to} \prod_{i \lt j} (x_i-x_j) \color{red}{\to} \sum _{\sigma \in S_{n}}\left(\operatorname {sgn}(\sigma )\prod _{i=1}^{n}x_{i}^{\sigma _{i}-1}\right) \color{blue}{\to} \sum _{\sigma \in S_{n}}\left(\operatorname {sgn}(\sigma )\prod _{i=1}^{n}x_{i,\sigma _{i}}\right)$$
This says that, assuming (a) the determinant is multilinear of a certain form, and (b) the Vandermonde formula holds true in the particular case of a Vandermonde determinant, then the general determinant formula can be univocally derived. It goes without saying that this is a theoretical result, not a practical "trick", and not a method to calculate specific determinants.