There are, in fact, two different formulas for standard deviation here: The population standard deviation $\sigma$ and the sample standard deviation $s$.
If $x_1, x_2, \ldots, x_N$ denote all $N$ values from a population, then the (population) standard deviation is
$$\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - \mu)^2},$$
where $\mu$ is the mean of the population.
If $x_1, x_2, \ldots, x_N$ denote $N$ values from a sample, however, then the (sample) standard deviation is
$$s = \sqrt{\frac{1}{N-1} \sum_{i=1}^N (x_i - \bar{x})^2},$$
where $\bar{x}$ is the mean of the sample.
The reason for the change in formula with the sample is this: When you're calculating $s$ you are normally using $s^2$ (the sample variance) to estimate $\sigma^2$ (the population variance). The problem, though, is that if you don't know $\sigma$ you generally don't know the population mean $\mu$, either, and so you have to use $\bar{x}$ in the place in the formula where you normally would use $\mu$. Doing so introduces a slight bias into the calculation: Since $\bar{x}$ is calculated from the sample, the values of $x_i$ are on average closer to $\bar{x}$ than they would be to $\mu$, and so the sum of squares $\sum_{i=1}^N (x_i - \bar{x})^2$ turns out to be smaller on average than $\sum_{i=1}^N (x_i - \mu)^2$. It just so happens that that bias can be corrected by dividing by $N-1$ instead of $N$. (Proving this is a standard exercise in an advanced undergraduate or beginning graduate course in statistical theory.) The technical term here is that $s^2$ (because of the division by $N-1$) is an unbiased estimator of $\sigma^2$.
Another way to think about it is that with a sample you have $N$ independent pieces of information. However, since $\bar{x}$ is the average of those $N$ pieces, if you know $x_1 - \bar{x}, x_2 - \bar{x}, \ldots, x_{N-1} - \bar{x}$, you can figure out what $x_N - \bar{x}$ is. So when you're squaring and adding up the residuals $x_i - \bar{x}$, there are really only $N-1$ independent pieces of information there. So in that sense perhaps dividing by $N-1$ rather than $N$ makes sense. The technical term here is that there are $N-1$ degrees of freedom in the residuals $x_i - \bar{x}$.
For more information, see Wikipedia's article on the sample standard deviation.
Say you select samples randomly from the same population. The sampling mean x^ is the random variable (a value obtained from the random process of selection)defined as the mean of the values of a given sample. The CLT then says that (among other things) if
you take a large-enough number of random samples, all of the same size N, and for
each sample s with values $x_s1,x_s2,....,x_sN $ , you calculate:
$x_s$:=$\frac{x_s1+....+xs_N}{N}$
That the collection of all these values of x^:=sampling mean has a normal distribution with mean equal to the population mean, and has a standard deviation equal to the population standard deviation divided by $ n^{1/2}$.
Then the values of $x_s$ will be (are) normally-distributed, with mean $\mu_s$=
$\mu_{pop}$ , where $\mu_{pop}$ is the true population mean, i.e., the value you would get if you were to sample every single member of your population (but you can tell that doing this is often impractical and/or too costly) ,and then divide by the size of the population.
Say, now (assuming we don't know the true value of the population mean, otherwise no point in collecting sample data), you collect a sample s' of size N from your population, and you get a sample mean $x_s'$. Now, unless you know in advance the value of $\mu$ , there is no way of knowing whether it equals $x_s'$; the best you can do is to use the CLT to determine an interval centered at $x_s'$ that contains the value of $\mu$, with a certain probability. For this, you use the CLT: specifically, the value of $x_s'$ will be a certain number k of deviations from the mean. Now, using the fact that x^ is normally-distributed, you use the probability of obtaining a value that is k deviations from the mean, and this gives you the interval. Let me exapand on this a little later.
Best Answer
If you mean "normally distributed", then the distribution of the sample mean is normal with the same expected value as the population mean, namely $12$, and with standard deviation equal to the standard deviation of the population divided by $\sqrt{40}$. Thus it is $4/\sqrt{40}\approx0.6324555\ldots$. The number $10$ deviates from the expected value by $10-12=-2$. If you divide that by the standard deviation of the sample mean, you get $-2/0.6324555\ldots\approx-3.1622\ldots$. That means you're looking at a number about $3.1622$ standard deviations below the mean. You should have a table giving the probabilty of being below number that's a specified number of standard deviations above or below the mean.
If you don't mean normally distributed, then the sample size of $40$ tells us that if the distribution is not too skewed, the distribution of the sample mean will be nearly normally distributed even if the population is not.
The expected value and standard deviation of the sample mean stated above do not depend on whether the population is normally distributed nor even on whether it's highly skewed.