Just to be clear: The expectation of an Indicator Random Variable is the probability of it being 1.
$$\begin{align}\mathsf E(X_i) & = 0\cdot \mathsf P(X_i=0)+1\cdot \mathsf P(X_i=1)\\ & = \mathsf P(X_i=1)\end{align}$$
$X_i$ is the indicator that red ball #$i$, for $i\in\{1..10\}$, is one of the $12$ out of $30$ balls drawn.
Imagine we lay the balls in a row and say the first $12$ balls will be the ones drawn. Of all the ways to arrange the balls, out red ball #$i$ will be among the favoured set $12$ in every $30$ arrangements.
Thus $\mathsf P(X_i=1)=\frac{12}{30}$
Thus $\mathsf E(X) $ $=\sum_{i=1}^{10} \frac {12}{30} \\= 4$
One of the things new students often find counter-intuitive is that Linearity of Expectation does not require the summed random variables to be independent.
Similarly $\mathsf E(Y) $ $= \sum_{j=1}^8 \mathsf E(Y_j)\\= 3.2$
However, when it comes the the expectation of multiples of random variables, you do need to consider their joint probability, because $$\begin{align}\mathsf E(XY) & = \mathsf E((\sum_{i=1}^{12}X_i) (\sum_{j=1}^8 Y_j)) \\ & = \sum_{i=1}^{12}\sum_{j=1}^8\mathsf P(X_i=1,Y_j=1)\end{align}$$
Can you complete?
If you want intuition about the covariance representing "how the two random variables move around their means with respect to one another," it is better to use the following different (but equivalent) formula.
$$\begin{align}\text{Cov}(X,Y) &= E[(X-E[X])(Y-E[Y])]\\[2ex]&= E[XY-X~E(Y)-Y~E(X)+E(X)~E(Y)]\\[2ex]&=E(XY)-E(X)~E(Y)\end{align}$$
Best Answer
For the discrete case, and if $X$ is nonnegative, $E[X] = \sum_{x=0}^\infty x P(X = x)$. That means we're adding up $P(X = 0)$ zero times, $P(X = 1)$ once, $P(X = 2)$ twice, etc. This can be represented in array form, where we're adding column-by-column:
$$\begin{matrix} P(X=1) & P(X = 2) & P(X = 3) & P(X = 4) & P(X = 5) & \cdots \\ & P(X = 2) & P(X = 3) & P(X = 4) & P(X = 5) & \cdots \\ & & P(X = 3) & P(X = 4) & P(X = 5) & \cdots \\ & & & P(X = 4) & P(X = 5) & \cdots \\ & & & & P(X = 5) & \cdots\end{matrix}.$$
We could also add up these numbers row-by-row, though, and get the same result. The first row has everything but $P(X = 0)$ and so sums to $P(X > 0)$. The second row has everything but $P(X =0)$ and $P(X = 1)$ and so sums to $P(X > 1)$. In general, the sum of row $x+1$ is $P(X > x)$, and so adding the numbers row-by-row gives us $\sum_{x = 0}^{\infty} P(X > x)$, which thus must also be equal to $\sum_{x=0}^\infty x P(X = x) = E[X].$
The continuous case is analogous.
In general, switching the order of summation (as in the proof the OP links to) can always be interpreted as adding row-by-row vs. column-by-column.