"Nevertheless if the two variables are normally distributed, then uncorrelatedness does imply independence" is a very common fallacy.
That only applies if they are jointly normally distributed.
The counterexample I have seen most often is normal $X \sim N(0,1)$ and independent Rademacher $Y$ (so it is 1 or -1 with probability 0.5 each); then $Z=XY$ is also normal (clear from considering its distribution function), $\operatorname{Cov}(X,Z)=0$ (the problem here is to show $\mathbb{E}(XZ)=0$ e.g. by iterating expectation on $Y$, and noting that $XZ$ is $X^2$ or $-X^2$ with probability 0.5 each) and it is clear the variables are dependent (e.g. if I know $X>2$ then either $Z>2$ or $Z<-2$, so information about $X$ gives me information about $Z$).
It's also worth bearing in mind that marginal distributions do not uniquely determine joint distribution. Take any two real RVs $X$ and $Y$ with marginal CDFs $F_X(x)$ and $G_Y(y)$. Then for any $\alpha<1$ the function:
$$H_{X,Y}(x,y)=F_X(x)G_Y(y)\left(1+\alpha\big(1-F_X(x)\big)\big(1-F_Y(y)\big)\right)$$
will be a bivariate CDF. (To obtain the marginal $F_X(x)$ from $H_{X,Y}(x,y)$ take the limit as $y$ goes to infinity, where $F_Y(y)=1$. Vice-versa for $Y$.) Clearly by selecting different values of $\alpha$ you can obtain different joint distributions!
"Covariance" is used in many distinct senses. It can be
a property of a bivariate population,
a property of a bivariate distribution,
a property of a paired dataset, or
an estimator of (1) or (2) based on a sample.
Because any finite collection of ordered pairs $((x_1,y_1), \ldots, (x_n,y_n))$ can be considered an instance of any one of these four things--a population, a distribution, a dataset, or a sample--multiple interpretations of "covariance" are possible. They are not the same. Thus, some non-mathematical information is needed in order to determine in any case what "covariance" means.
In light of this, let's revisit three statements made in the two referenced posts:
If $u,v$ are random vectors, then $\operatorname{Cov}(u,v)$ is the matrix of elements $\operatorname{Cov}(u_i,v_j).$
This is complicated, because $(u,v)$ can be viewed in two equivalent ways. The context implies $u$ and $v$ are vectors in the same $n$-dimensional real vector space and each is written $u=(u_1,u_2,\ldots,u_n)$, etc. Thus "$(u,v)$" denotes a bivariate distribution (of vectors), as in (2) above, but it can also be considered a collection of pairs $(u_1,v_1), (u_2,v_2), \ldots, (u_n,v_n)$, giving it the structure of a paired dataset, as in (3) above. However, its elements are random variables, not numbers. Regardless, these two points of view allow us to interpret "$\operatorname{Cov}$" ambiguously: would it be
$$\operatorname{Cov}(u,v) = \frac{1}{n}\left(\sum_{i=1}^n u_i v_i\right) - \left(\frac{1}{n}\sum_{i=1}^n u_i\right)\left(\frac{1}{n}\sum_{i-1}^n v_i\right),\tag{1}$$
which (as a function of the random variables $u$ and $v$) is a random variable, or would it be the matrix
$$\left(\operatorname{Cov}(u,v)\right)_{ij} = \operatorname{Cov}(u_i,v_j) = \mathbb{E}(u_i v_j) - \mathbb{E}(u_i)\mathbb{E}(v_j),\tag{2}$$
which is an $n\times n$ matrix of numbers? Only the context in which such an ambiguous expression appears can tell us which is meant, but the latter may be more common than the former.
If $u,v$ are not random vectors, then $\operatorname{Cov}(u,v)$ is the scalar $\Sigma u_i v_i$.
Maybe. This assertion understands $u$ and $v$ in the sense of a population or dataset and assumes the averages of the $u_i$ and $v_i$ in that dataset are both zero. More generally, for such a dataset, their covariance would be given by formula $(1)$ above.
Another nuance is that in many circumstances $(u,v)$ represent a sample of a bivariate population or distribution. That is, they are considered not as an ordered pair of vectors but as a dataset $(u_1,v_1), (u_2,v_2), \ldots, (u_n,v_n)$ wherein each $(u_i,v_i)$ is an independent realization of a common random variable $(U,V)$. Then, it is likely that "covariance" would refer to an estimate of $\operatorname{Cov}(U,V)$, such as
$$\operatorname{Cov}(u,v) = \frac{1}{n-1}\left(\sum_{i=1}^n u_i v_i - \frac{1}{n}\left(\sum_{i=1}^n u_i\right)\left(\sum_{i-1}^n v_i\right)\right).$$
This is the fourth sense of "covariance."
If two vectors are not random, then their covariance is zero.
This is an unusual interpretation. It must be thinking of "covariance" in the sense of formula $(2)$ above,
$$\left(\operatorname{Cov}(u,v)\right)_{ij} = \operatorname{Cov}(u_i,v_j) = 0$$
Each $u_i$ and $v_j$ is considered, in effect, a random variable that happens to be a constant.
In a regression context (where vectors, numbers, and random variables all occur together) some of these distinctions are further elaborated in the thread on variance and covariance in the context of deterministic values.
Best Answer
For binary variables their expected value equals the probability that they are equal to one. Therefore,
$$ E(XY) = P(XY = 1) = P(X=1 \cap Y=1) \\ E(X) = P(X=1) \\ E(Y) = P(Y=1) \\ $$
If the two have zero covariance this means $E(XY) = E(X)E(Y)$, which means
$$ P(X=1 \cap Y=1) = P(X=1) \cdot P(Y=1) $$
It is trivial to see all other joint probabilities multiply as well, using the basic rules about independent events (i.e. if $A$ and $B$ are independent then their complements are independent, etc.), which means the joint mass function factorizes, which is the definition of two random variables being independent.