If you want intuition about the covariance representing "how the two random variables move around their means with respect to one another," it is better to use the following different (but equivalent) formula.
$$\begin{align}\text{Cov}(X,Y) &= E[(X-E[X])(Y-E[Y])]\\[2ex]&= E[XY-X~E(Y)-Y~E(X)+E(X)~E(Y)]\\[2ex]&=E(XY)-E(X)~E(Y)\end{align}$$
As already noted in the comments, the concepts of cosine similarity and correlation are different. In particular, as explained below, the cosine of the angle between two vectors can be considered equivalent to the correlation coefficient only if the random variables have zero means. This explains why two orthogonal vectors, whose cosine similarity is zero, can show some correlation, and then a covariance different from zero as in the example of the OP.
Cosine similarity is obtained by taking the inner product and dividing it by the vectors’ $L2$ norms. The formula is
$${\displaystyle CS(x,y) ={\frac {\sum \limits _{i=1}^{n}{x_{i}x_{i}}}{{\sqrt {\sum \limits _{i=1}^{n}{x_{i}^{2}}}}{\sqrt {\sum \limits _{i=1}^{n}{y_{i}^{2}}}}}}= {\langle x,y \rangle \over \| x \|\|{y} \|} }$$
and corresponds to the cosine of the angle between the two vectors.
Cosine similarity is bounded between $-1$ and $1$. However, in most applications where this measure is used, the vectors are non-negative, so in these cases it ranges between $0$ and $1$. Importantly, cosine similarity is invariant to scaling (i.e. multiplying all terms by a nonzero constant) but is not invariant to shifts (i.e. adding a constant to all terms).
On the other hand, correlation can be seen as the cosine similarity measured between the centered versions of the two vectors. In fact, indicating with $\overline{x}$ and $\overline{y}$ the means, we have
$${\displaystyle r(x,y) ={\frac {\sum \limits _{i=1}^{n}({x_{i}-\overline{x})(y_{i}- \overline{y} ) }}{{\sqrt {\sum \limits _{i=1}^{n}{ (x_{i}-\overline{x}) ^{2}}}}{\sqrt {\sum \limits _{i=1}^{n}{
(y_{i}-\overline{y})^{2}}}}}}} = {\langle x-\overline{x}, \,y -\overline{y}\rangle \over \| x-\overline{x} \|\|{y}-\overline{y} \|} $$
and then
$$r(x,y)=CS(x-\overline{x}, \,y -\overline{y})$$
It is worthy of note that correlation is bounded between $-1$ and $1$ as well, but unlike cosine similarity it is invariant to both scaling and shifts.
We conclude that the cosine similarity is equal to the correlation coefficient only when the vectors $x$ and $y$ are centered (i.e., they have zero means).
Best Answer
The space $L^0(\Omega)$ of all random variables on a fixed sample space $\Omega$ is a vector space - the (outcome-wise) sum of two random variables is a random variable, and a scalar multiple of a random variable is again a random variable. So in that sense, random variables can be viewed as "vectors" because they are the elements of a vector space.
By "dot product" they likely mean the $L^2$ inner product, defined by $\langle X, Y \rangle = E[XY]$. This obeys the same basic algebraic properties as the ordinary Euclidean dot product: bilinear (with respect to the addition and scalar multiplication described above), symmetric, positive definite. Strictly speaking, this inner product doesn't necessarily live on $L^0(\Omega)$, but rather on the vector subspace $L^2(\Omega) \subset L^0(\Omega)$ consisting of random variables with finite second moment.