Maybe this simple example will help. I use it when I teach
conditional expectation.
(1) The first step is to think of ${\mathbb E}(X)$ in a new way:
as the best estimate for the value of a random variable $X$ in the absence of any information.
To minimize the squared error
$${\mathbb E}[(X-e)^2]={\mathbb E}[X^2-2eX+e^2]={\mathbb E}(X^2)-2e{\mathbb E}(X)+e^2,$$
we differentiate to obtain $2e-2{\mathbb E}(X)$, which is zero at $e={\mathbb E}(X)$.
For example, if I throw a fair die and you have to
estimate its value $X$, according to the analysis above, your best bet is to guess ${\mathbb E}(X)=3.5$.
On specific rolls of the die, this will be an over-estimate or an under-estimate, but in the long run it minimizes the mean square error.
(2) What happens if you do have additional information?
Suppose that I tell you that $X$ is an even number.
How should you modify your estimate to take this new information into account?
The mental process may go something like this: "Hmmm, the possible values were $\lbrace 1,2,3,4,5,6\rbrace$
but we have eliminated $1,3$ and $5$, so the remaining possibilities are $\lbrace 2,4,6\rbrace$.
Since I have no other information, they should be considered equally likely and hence the revised expectation is $(2+4+6)/3=4$".
Similarly, if I were to tell you that $X$ is odd, your revised (conditional) expectation is 3.
(3) Now imagine that I will roll the die and I will tell you the parity of $X$; that is, I will
tell you whether the die comes up odd or even. You should now see that a single numerical response
cannot cover both cases. You would respond "3" if I tell you "$X$ is odd", while you would respond "4" if I tell you "$X$ is even".
A single numerical response is not enough because the particular piece of information that I will give you is itself random.
In fact, your response is necessarily a function of this particular piece of information.
Mathematically, this is reflected in the requirement that ${\mathbb E}(X\ |\ {\cal F})$ must be $\cal F$ measurable.
I think this covers point 1 in your question, and tells you why a single real number is not sufficient.
Also concerning point 2, you are correct in saying that the role of $\cal F$ in ${\mathbb E}(X\ |\ {\cal F})$
is not a single piece of information, but rather tells what possible specific pieces of (random) information may occur.
Complex numbers are inherently two-dimensional, so you will have a set of numbers, not just one. A good "real valued" analog is the multivariate gaussian. In essense, what you are tyring to do is similar to getting a correlation between two vectors of two variables each, whcih partition the dimentions of a 4-dimensional gaussian distribution.
The complex case is a litte more nuanced because the real and complex parts of each "complex number" must bear a certain relationship to each other.
This paper will help you a great deal.
Best Answer
If you want intuition about the covariance representing "how the two random variables move around their means with respect to one another," it is better to use the following different (but equivalent) formula.
$$\begin{align}\text{Cov}(X,Y) &= E[(X-E[X])(Y-E[Y])]\\[2ex]&= E[XY-X~E(Y)-Y~E(X)+E(X)~E(Y)]\\[2ex]&=E(XY)-E(X)~E(Y)\end{align}$$