The idea is that you can start with any estimator of $(1-\theta)^2$, no matter how awful, provided it is unbiased. The Rao-Blackwell process will almost magically turn it into a uniformly minimum-variance unbiased estimator (UMVUE).
There are many ways to proceed. One fruitful idea is systematically to remove the complications in the expression "$(1-\theta)^2$". This leads to a sequence of questions:
How to find an unbiased estimator of $(1-\theta)^2$?
How to find an unbiased estimator of $1-\theta$?
How to find an unbiased estimator of $\theta$?
The answer to (3), at least, should be obvious: any of the $X_i$ will be an unbiased estimator because
$$\mathbb{E}_\theta(X_i) = \theta.$$
(It doesn't matter how you come up with this estimator: by guessing and checking (which often works), Maximum Likelihood, or whatever. ML, incidentally, often is not helpful because it tends to produce biased estimators. What is helpful is the extreme simplicity of working with a single observation rather than a whole bunch of them.)
Linearity of expectation tells us an answer to (2) would be any of the $1 - X_i$, because
$$\mathbb{E}_\theta(1-X_i) = 1 - \mathbb{E}_\theta(X_i) = 1-\theta.$$
Getting from this to an answer to (1) is the crux of the matter. At some point you will need to exploit the fact you have more than one independent realization of this Bernoulli variable, because it quickly becomes obvious that a single $0-1$ observation just can't tell you much. For instance, the square of $1-X_1$ won't work, because (since $(1-X_1)^2 = (1-X_1)$)
$$\mathbb{E}_\theta((1-X_1)^2) = 1-\theta.$$
What could be done with two of the observations, such as $X_1$ and $X_2$? A little thought might eventually suggest considering their product. Sure enough, because $X_1$ and $X_2$ are independent, their expectations multiply:
$$\mathbb{E}_\theta((1-X_1)(1-X_2)) = \mathbb{E}_\theta((1-X_1))\mathbb{E}_\theta((1-X_2)) = (1-\theta)(1-\theta)=(1-\theta)^2.$$
You're now good to go: apply the Rao-Blackwell process to the unbiased estimator $T=(1-X_1)(1-X_2)$ to obtain an UMVUE. (That is, find its expectation conditional on $\sum X_i$.) I'll stop here so that you can have the fun of discovering the answer for yourself: it's marvelous to see what kinds of formulas can emerge from this process.
To illustrate the calculation let's take the simpler case of three, rather than four, $X_i$. The sum $S=X_1+X_2+X_3$ counts how many of the $X_i$ equal $1$. Look at the four possibilities:
When $S=0$, all the $X_i=0$ and $T = 1$ constantly, whence $\mathbb{E}(T\,|\,S=0)=1$.
When $S=1$, there are three possible configurations of the $X_i$: $(1,0,0)$, $(0,1,0)$, and $(0,0,1)$. All are equally likely, giving each a chance of $1/3$. The value of $T$ is $0$ for the first two and $1$ for the last. Therefore
$$\mathbb{E}(T\,|\,S=1) = \left(\frac{1}{3}\right)\left(0\right)+\left(\frac{1}{3}\right)\left(0\right)+\left(\frac{1}{3}\right)\left(1\right) = \frac{1}{3}.$$
When $S=2$ or $S=3$, $T=0$ no matter what order the $X_i$ appear in, giving $0$ for the conditional expectation.
The Rao-Blackwellized version of $T$, then, is the estimator that associates with the sum $S$ the following guesses for $\theta$:
$$\tilde T(0)=1,\ \tilde T(1)=1/3,\ \tilde T(2)=\tilde T(3)=0.$$
As a check, the expectation of $\tilde T$ can be computed as
$$\eqalign{
\mathbb{E}(\tilde T) &= \Pr(S=0)\tilde{T}(0) + \Pr(S=1)\tilde{T}(1) + \Pr(S=2)\tilde{T}(2) + \Pr(S=3)\tilde{T}(3) \\
&= (1-\theta)^3 + \binom{3}{1}\theta(1-\theta)^2\left(1/3\right) + 0 + 0 \\
&= 1 - 3 \theta + 3 \theta^2 - \theta^3 + 3(1/3)(\theta - 2\theta^2 + \theta^2) \\
&= 1 - 2\theta + \theta^2 \\
&=(1-\theta)^2,
}$$
showing it is unbiased. A similar calculation will obtain its variance (which is useful to know, since it is supposed to be the smallest possible variance among unbiased estimators).
Note that these calculations required little more than applying the definition of expectation and computing binomial probabilities.
Let us show that there can be a UMVUE which is not a sufficient statistic.
First of all, if the estimator $T$ takes (say) value $0$ on all samples, then clearly $T$ is a UMVUE of $0$, which latter can be considered a (constant) function of $\theta$. On the other hand, this estimator $T$ is clearly not sufficient in general.
It is a bit harder to find a UMVUE $Y$ of the "entire" unknown parameter $\theta$ (rather than a UMVUE of a function of it) such that $Y$ is not sufficient for $\theta$. E.g., suppose the "data" are given just by one normal r.v. $X\sim N(\tau,1)$, where $\tau\in\mathbb{R}$ is unknown. Clearly, $X$ is sufficient and complete for $\tau$.
Let $Y=1$ if $X\ge0$ and $Y=0$ if $X<0$, and let
$\theta:=\mathsf{E}_\tau Y=\mathsf{P}_\tau(X\ge0)=\Phi(\tau)$; as usual, we denote by $\Phi$ and $\varphi$, respectively, the cdf and pdf of $N(0,1)$.
So, the estimator $Y$ is unbiased for $\theta=\Phi(\tau)$ and is a function of the complete sufficient statistic $X$. Hence,
$Y$ is a UMVUE of $\theta=\Phi(\tau)$.
On the other hand, the function $\Phi$ is continuous and strictly increasing on $\mathbb{R}$, from $0$ to $1$. So, the correspondence $\mathbb{R}\ni\tau=\Phi^{-1}(\theta)\leftrightarrow\theta=\Phi(\tau)\in(0,1)$ is a bijection. That is, we can re-parametirize the problem, from $\tau$ to $\theta$, in a one-to-one manner. Thus, $Y$ is a UMVUE of $\theta$, not just for the "old" parameter $\tau$, but for the "new" parameter $\theta\in(0,1)$ as well. However, $Y$ is not sufficient for $\tau$ and therefore not sufficient for $\theta$. Indeed,
\begin{multline*}
\mathsf{P}_\tau(X<-1|Y=0)=\mathsf{P}_\tau(X<-1|X<0)=\frac{\mathsf{P}_\tau(X<-1)}{\mathsf{P}_\tau(X<0)} \\
=\frac{\Phi(-\tau-1)}{\Phi(-\tau)}
\sim\frac{\varphi(-\tau-1)/(\tau+1)}{\varphi(-\tau)/\tau}\sim\frac{\varphi(-\tau-1)}{\varphi(-\tau)}=e^{-\tau-1/2}
\end{multline*}
as $\tau\to\infty$; here we used the known asymptotic equivalence $\Phi(-\tau)\sim\varphi(-\tau)/\tau$ as $\tau\to\infty$, which follows by
the l'Hospital rule.
So, $\mathsf{P}_\tau(X<-1|Y=0)$ depends on $\tau$ and hence on $\theta$, which shows that $Y$ is not sufficient for $\theta$ (whereas $Y$ is a UMVUE for $\theta$).
Best Answer
First, observe that $S = 1$ only if $X_1 = 1$ and $X_2 = 1$ and $X_3 = 1$. Given that we have drawn $n$ samples and $T$ of them are equal to one, the probability that $S = 1$ is:
$$P(S=1) = \max\left(0, {T \over n}{T-1\over n-1}{T-2\over n-2}\right)$$
where the $(T-1)/(n-1)$ and $(T-2)/(n-2)$ terms come about because we are sampling without replacement from our $n$ observations; for example, once we know that $X_1 = 1$, there are only $n-1$ values left that $X_2$ might take on, and only $T-1$ of them are equal to one. If $T < 2$, the probability that $S=1$ is of course 0, not negative.
Consequently, as $\mathbb{E_{\theta}}(X_1X_2X_3\mid T) = 1\cdot P(S=1) + 0\cdot P(S=0)$, which just equals $P(S=1)$,
$$\mathbb{E_{\theta}}(X_1X_2X_3\mid T) = \max\left(0, {T \over n}{T-1\over n-1}{T-2\over n-2}\right)$$.
Let's try an experiment to see how much Rao-Blackwell has improved our estimator. We're going to use a smallish sample size, $n=10$, and set $\theta = 0.6$. The code below is horribly inefficient but, for people who may not be so familiar with R, should be pretty easy to follow:
Now for the output:
Rao-Blackwellization has reduced the variance of the estimator by a factor of six, far more than the factor of 3.3 which is the ratio of the complete sample size (10) to the number of observations used to construct the initial unbiased estimator (3).