You seem to be thinking that $\sqrt{\text{Var}(\bar X-\bar Y)} = \sqrt{\text{Var}(\bar X)} + \sqrt{\text{Var}(\bar Y)}$.
This is not the case for independent variables.
For $X,Y$ independent, $\text{Var}(\bar X-\bar Y) = \text{Var}(\bar X) + \text{Var}(\bar Y)$
Further,
$\text{Var}(\bar X) = \text{Var}(\frac{1}{n}\sum_iX_i) = \frac{1}{n^2}\text{Var}(\sum_iX_i)= \frac{1}{n^2}\sum_i\text{Var}(X_i)= \frac{1}{n^2}\cdot n\cdot\sigma^2_1= \sigma^2_1/n$
(if the $X_i$ are independent of each other).
http://en.wikipedia.org/wiki/Variance#Basic_properties
In summary: the correct term:
$\color{red}{(1)}$ has $\sigma^2/n$ terms because we're looking at averages and that's the variance of an average of independent random variables;
$\color{red}{(2)}$ has a $+$ because the two samples are independent, so their variances (of the averages) add; and
$\color{red}{(3)}$ has a square root because we want the standard deviation of the distribution of the difference in sample means (the standard error of the difference in means). The part under the bar of the square root is the variance of the difference (the square of the standard error). Taking square roots of squared standard errors gives us standard errors.
The reason why we don't just add standard errors is standard errors don't add - the standard error of the difference in means is NOT the sum of the standard errors of the sample means for independent samples - the sum will always be too large. The variances do add, though, so we can use that to work out the standard errors.
Here's some intuition about why it's not standard deviations that add, rather than variances.
To make things a little simpler, just consider adding random variables.
If $Z = X+Y$, why is $\sigma_Z < \sigma_X+\sigma_Y$?
Imagine $Y = kX$ (for $k\neq 0$); that is, $X$ and $Y$ are perfectly linearly dependent. That is, they always 'move together' in the same direction and in proportion.
Then $Z = (k+1)X$ - which is simply a rescaling. Clearly $\sigma_Z = (k+1)\sigma_X = \sigma_X+\sigma_Y$.
That is, when $X$ and $Y$ are perfectly positively linearly dependent, always moving up or down together, standard deviations add.
When they don't always move up or down together, sometimes they move opposite directions. That means that their movements partly 'cancel out', yielding a smaller standard deviation than the direct sum.
regardless of the estimate and the sampling procedure?
No, you will not have a "root-n" effect regardless of those things, since at least some standard errors do not scale with $\sqrt{n}$.
Many do -- quite possibly all the ones you will be likely to use -- but that's not all of them.
For things that do scale with $\sqrt{n}$ then you expect to halve the standard error by quadrupling sample size. So (at least if we're ignoring sampling variation in the estimate of $\sigma$), that's probably what you need.
One example of something that isn't proportional to $\frac{1}{\sqrt{n}}$ is the standard error of a kernel density estimate when the bandwidth is itself chosen as a function of $n$. [For some common choices of bandwidth formula the standard error goes down as $n^{-2/5}$ instead.]
Best Answer
This comes from the fact that $\newcommand{\Var}{\operatorname{Var}}\newcommand{\Cov}{\operatorname{Cov}}\Var(X+Y) = \Var(X) + \Var(Y) + 2\cdot\Cov(X,Y)$ and for a constant $a$, $\Var( a X ) = a^2 \Var(X)$.
Since we are assuming that the individual observations are independent the $\Cov(X,Y)$ term is $0$ and since we assume that the observations are identically distributed all the variances are $\sigma^2$. So
$\Var( \frac{1}{n} \sum X_i ) = \frac{1}{n^2} \sum \Var(X_i) = \frac{1}{n^2} \times \sum_{i=1}^n \sigma^2= \frac{n}{n^2} \sigma^2 = \frac{\sigma^2}{n}$
And when we take the square root of that (because it is harder to think on the variance scale) we get $\dfrac{\sigma}{\sqrt{n}}$.
More intuitively, think of 2 statistics classes: in the first the teacher assigns each of the students to draw a sample of size 10 from a set of tiles with numbers on them (the teacher knows the true mean of this population, but the students don't) and compute the mean of their sample. The second teacher assigns each of his/her students to take samples of size 100 from the same set of tiles and compute the mean. Would you expect every sample mean to exactly match the population mean? or to vary about it? Would you expect the spread of the sample means to be the same in both classes? or would the 2nd class tend to be closer to the population? That's why it makes sense to divide by a function of the sample size. The square root means we have a law of diminishing returns, to halve the standard error you need to quadruple the sample size.
As for the name, the full name is "The estimated standard deviation of the sampling distribution of x-bar"; it only takes saying that a few times before you appreciate having a shortened form. I don't know who first substituted "error" for "deviation" this way, but it stuck. The standard deviation measures variability of individual observations; the standard error measures variability in estimates of parameters (based on observations).