If this is a completely balanced within-subject design with $27\times 6=162$ observations, then you can actually calculate the marginal means: simply average over the levels of the second factor. Of course, you have to be sure that averaging over different conditions is meaningful for your planned experiment - do you expect each of those conditions to be present with about 1/3 probability?
The real difficulty is with the variance of the difference. It is well known that $$Var(X-Y) = Var(X) + Var(Y) - 2 SD(X)SD(Y)Corr(X,Y)$$
The problem is that you don't know the within-subject correlation.
Option 1. You could just to guess at a value: would you expect the correlation to be high or low? Since higher correlation will lead to lower variance, you could assume the worst case scenario of 0 correlation, and be guaranteed to overestimate the required sample size (unless the true correlation is negative, but that is rare).
Option 2. If the published results have more information, like a p-value from a test, you could try to figure out the correlation. For a complicated design like this one, it might be difficult to do analytically, but you could try a simulation approach. Given a correlation coefficient, simulate data with the given means and variances, run the test and check the p-value. Modify the correlation coefficient until you get close to the published result.
You seem to be thinking that $\sqrt{\text{Var}(\bar X-\bar Y)} = \sqrt{\text{Var}(\bar X)} + \sqrt{\text{Var}(\bar Y)}$.
This is not the case for independent variables.
For $X,Y$ independent, $\text{Var}(\bar X-\bar Y) = \text{Var}(\bar X) + \text{Var}(\bar Y)$
Further,
$\text{Var}(\bar X) = \text{Var}(\frac{1}{n}\sum_iX_i) = \frac{1}{n^2}\text{Var}(\sum_iX_i)= \frac{1}{n^2}\sum_i\text{Var}(X_i)= \frac{1}{n^2}\cdot n\cdot\sigma^2_1= \sigma^2_1/n$
(if the $X_i$ are independent of each other).
http://en.wikipedia.org/wiki/Variance#Basic_properties
In summary: the correct term:
$\color{red}{(1)}$ has $\sigma^2/n$ terms because we're looking at averages and that's the variance of an average of independent random variables;
$\color{red}{(2)}$ has a $+$ because the two samples are independent, so their variances (of the averages) add; and
$\color{red}{(3)}$ has a square root because we want the standard deviation of the distribution of the difference in sample means (the standard error of the difference in means). The part under the bar of the square root is the variance of the difference (the square of the standard error). Taking square roots of squared standard errors gives us standard errors.
The reason why we don't just add standard errors is standard errors don't add - the standard error of the difference in means is NOT the sum of the standard errors of the sample means for independent samples - the sum will always be too large. The variances do add, though, so we can use that to work out the standard errors.
Here's some intuition about why it's not standard deviations that add, rather than variances.
To make things a little simpler, just consider adding random variables.
If $Z = X+Y$, why is $\sigma_Z < \sigma_X+\sigma_Y$?
Imagine $Y = kX$ (for $k\neq 0$); that is, $X$ and $Y$ are perfectly linearly dependent. That is, they always 'move together' in the same direction and in proportion.
Then $Z = (k+1)X$ - which is simply a rescaling. Clearly $\sigma_Z = (k+1)\sigma_X = \sigma_X+\sigma_Y$.
That is, when $X$ and $Y$ are perfectly positively linearly dependent, always moving up or down together, standard deviations add.
When they don't always move up or down together, sometimes they move opposite directions. That means that their movements partly 'cancel out', yielding a smaller standard deviation than the direct sum.
Best Answer
In the first situation, in two groups $i\in\{1,2\}$ of $n_i$ binary responses you received $K_i$ positive responses and $P_i = K_i/n_i$ is the proportion. (I use capital letters to denote random variables.) Equivalently, $K_i = P_i n_i.$
Under the null hypothesis, each response is independently random and the chance of a positive result is $\pi,$ say. Consequently $K_1 + K_2$ has a Binomial$(n_1+n_2,\pi)$ distribution and you may estimate $\pi$ with the overall fraction of positives
Still assuming the null, each $K_i$ independently follows a Binomial$(n_i,\pi)$ distribution and therefore has a variance of $n_i\pi(1-\pi).$ The sample means are $P_i=K_i/n_i.$ The variance of their difference therefore is
$$\begin{aligned} \operatorname{Var}\left(P_2-P_1\right)&= \operatorname{Var}\left(\frac{K_2}{n_2}-\frac{K_1}{n_1}\right)\\&= \frac{1}{n_2^2}\operatorname{Var}(K_2) + \frac{1}{n_1^2}\operatorname{Var}(K_1)\\&= \frac{n_2\pi(1-\pi)}{n_2^2} + \frac{n_1\pi(1-\pi)}{n_1^2}\\&= \pi(1-\pi)\left(\frac{1}{n_2}+\frac{1}{n_1}\right). \end{aligned}$$
To apply this, you use your estimate $P=\hat\pi$ in place of $\pi.$ Plugging that in and taking the square root gives the pooling formula you quote,
For the second question, because the sample variance is the sum of squared residuals divided by $n_i-1,$ multiplying by $n_i-1$ gives the sum of squared residuals. Under the null hypothesis all squared residuals are exchangeable, so you can add them up and divide by one less than their combined count, $n_1+n_2-1,$ to obtain an estimate of the variance based on all the data. This assumes both standard deviations were computed relative to the overall mean $P.$ When they are computed relative to their separate group means $P_i,$ then a different pooling formula is needed altogether.
These are all considerations of means and variances and therefore do not rely on the Central Limit Theorem or any unstated distributional assumptions.