You are correct that you can express $Z$ either of the ways that you wrote; that’s what it means to be “equal”. However, you have a misunderstanding about the central limit theorem, which explicitly concerns $Z$, not $\bar X$.
People often like thinking of the sample mean converging so that the following holds asymptotically, as it is an algebraic rearrangement of the central limit theorem.
$$
\bar X_n \sim N(\mu, \sigma^2/n)
$$
Such a notion is problematic, as it makes the convergence target a moving target, since $n$, therefore the variance, changes as the sample size increases. Further, if we have a distribution with bounded support, you are proposing that its sample mean could be off of the support, such as $\bar X=-1$ for an exponential distribution that does not give negative values. (A normal distribution has support on the entire real line.)
Further, the sum does not have to do anything close to converge. Consider $X_1,\dots,X_n\overset{iid}{\sim} U(1,2)$. That uniform distribution meets the assumptions of the classical central limit theorem. However, a sum of those values is going to diverge off to infinity, something like $1+1.1+1.7+1.2+\dots$ At least dividing by the sample size allows the mean to be controlled and kept from exploding off to infinity.
Best Answer
Nice question (+1)!!
You will remember that for independent random variables $X$ and $Y$, $Var(X+Y) = Var(X) + Var(Y)$ and $Var(a\cdot X) = a^2 \cdot Var(X)$. So the variance of $\sum_{i=1}^n X_i$ is $\sum_{i=1}^n \sigma^2 = n\sigma^2$, and the variance of $\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i$ is $n\sigma^2 / n^2 = \sigma^2/n$.
This is for the variance. To standardize a random variable, you divide it by its standard deviation. As you know, the expected value of $\bar{X}$ is $\mu$, so the variable
$$ \frac{\bar{X} - E\left( \bar{X} \right)}{\sqrt{ Var(\bar{X}) }} = \sqrt{n} \frac{\bar{X} - \mu}{\sigma}$$ has expected value 0 and variance 1. So if it tends to a Gaussian, it has to be the standard Gaussian $\mathcal{N}(0,\;1)$. Your formulation in the first equation is equivalent. By multiplying the left hand side by $\sigma$ you set the variance to $\sigma^2$.
Regarding your second point, I believe that the equation shown above illustrates that you have to divide by $\sigma$ and not $\sqrt{\sigma}$ to standardize the equation, explaining why you use $s_n$ (the estimator of $\sigma)$ and not $\sqrt{s_n}$.
Addition: @whuber suggests to discuss the why of the scaling by $\sqrt{n}$. He does it there, but because the answer is very long I will try to capture the essense of his argument (which is a reconstruction of de Moivre's thoughts).
If you add a large number $n$ of +1's and -1's, you can approximate the probability that the sum will be $j$ by elementary counting. The log of this probability is proportional to $-j^2/n$. So if we want the probability above to converge to a constant as $n$ goes large, we have to use a normalizing factor in $O(\sqrt{n})$.
Using modern (post de Moivre) mathematical tools, you can see the approximation mentioned above by noticing that the sought probability is
$$P(j) = \frac{{n \choose n/2+j}}{2^n} = \frac{n!}{2^n(n/2+j)!(n/2-j)!}$$
which we approximate by Stirling's formula
$$ P(j) \approx \frac{n^n e^{n/2+j} e^{n/2-j}}{2^n e^n (n/2+j)^{n/2+j} (n/2-j)^{n/2-j} } = \left(\frac{1}{1+2j/n}\right)^{n+j} \left(\frac{1}{1-2j/n}\right)^{n-j}. $$
$$ \log(P(j)) = -(n+j) \log(1+2j/n) - (n-j) \log(1-2j/n) \\ \sim -2j(n+j)/n + 2j(n-j)/n \propto -j^2/n.$$