There is a good reason to use $z$ instead of $e^{sT}$. Before starting with any analysis, let me remind you that in analysis of signals and systems we are interested in analyzing the frequency spectrum of the signal, i.e. the Laplace transform on the imaginary line $s = jw$. And since your signal $x[n]$ is discrete, then its frequency spectrum its periodic, so its more general define $s=jwT$.
Now, let $X(z) = \mathcal{Z}\{x[n]\}$ of a causal or non-causal discrete signal $x[n]$, i.e.
$$ X(z) = \sum_{n=-\infty}^{\infty} x[n] z^{-n}. $$
Since $z\in\mathbb{C}$ we have $z = |z| e^{j\arg z}$. Without loss of generality we rewrite $|z| = r$ and $\arg z = wT$, i.e. $z=r e^{jwT}$ (note that not necessarily $r=1$). Then
$$ \begin{aligned}
X(z) &= \sum_{n=-\infty}^{\infty} x[n] z^{-n}\\
&= \sum_{n=-\infty}^{\infty} x[n] (r e^{jwT})^{-n}\\
% &= \sum_{n=-\infty}^{\infty} (x[n] r^{-n}) e^{-njwT}\\
&= \sum_{n=-\infty}^{\infty} (x[n] r^{-n}) (e^{jwT})^{-n},
\end{aligned} $$
which implies $X(z) = \left. \mathcal{L}\{x[n]r^{-n}\} \right\rvert_{s=jwT} = \mathcal{F}\{x[n]r^{-n}\}$. As a consequence, $X(z)$ is a Fourier transform more generic than the Fourier transform $X(e^{jwT}) = \mathcal{F}\{x[n]\}$ of our signal of interest.
So, if the convergence radius of $X(z)$ is less than unity then $X(e^{jwT})$ does not exist and therefore its Fourier transform does not either, which represents a problem because there are many signals with this problem of convergence, e.g. non-causal signals such as a digital image filter. Therefore, it is convenient (and even necessary in non-causal signals) to use the Z-transform.
Or informally, use $z$ instead of $\left. e^{sT} \right\rvert_{s=jwT} = e^{jwT}$ whenever you can.
We also recommend to see this link about radius convergence.
At this point, it is clear that the Z-transform has the same objective as the Laplace transform: ensure the convergence of the transform in some region of $\mathbb{C}$, where the Z-transform does it for discrete signals and Laplace transform for continuous signals.
As far as I know an early reference for a thorough mathematical theory (in terms of today's mathematical language) of the Laplace transform and its inversion are the books by Gustav Doetsch. (There are several: A three volume handbook, some books on applications, a practical guide...). Probably one of this books also provide insight about the history (i.e. early references, Heaviside formal treatment,...) but I don't know which of them is available in English.
Moreover, there are two articles called "The development of the Laplace transform" by Deakin here and here that could be helpful.
You probably already found the Monthly article "What is the Laplace transform?"?
Best Answer
I realize that this is an older question, but I'll answer it anyway in case other people are looking for the answer.
As an electrical engineer, a Laplace transform usually references the two-sided Laplace transform. Therefore the function being transformed does not have to be causal, meaning that the function can equal something other than $0$ for $t < 0$. (Here's a wiki reference to the two-sided Laplace transform: https://en.wikipedia.org/wiki/Two-sided_Laplace_transform )
Using this version of the Laplace transform,
$$\mathcal{L}\{ u(-t)\}(s) = \int_{-\infty}^\infty u(-t)e^{-st} \ dt = \int_{-\infty}^0 e^{-st} \ dt = -\frac{1}{s}$$
This is true only when $Re[s] < 0$, which is this transform's region of convergence.
I hope this helps!
Sources: I'm an electrical engineering student.