$\newcommand{\lowersub}[1]{_{\lower{0.5ex}{\small #1}}}$
The Euclidean norm of an $n$-dimensional vector $(X_1, X_2,...,X_n)^\top$ is ${\lVert x\rVert}_2 =\sum\limits_{k=1}^n {X}^2_k$, which is a measure of the distance from the origin.
In a similar way, $\sqrt{\mathsf E\Big(\!\big(X-\mathsf E(X)\big)^2\Big)}$ is a measure of the magnitude a random variable deviates from its mean. Â Thus the standard deviation may be regarded as the weighted norm of the centred random variable.
( As A.S. commented, centering gives a more meaningful comparison of how two random variables deviate from their means. )
For discrete random variable we have:
$$\lVert X-\mu\lowersub X\rVert = \sqrt{\sum_{\forall x} (x-\mu\lowersub{X})^2\mathsf P(X=x)}$$
And for a continuous real-valued random variable we have:
$$\lVert X-\mu\lowersub{X}\rVert = \sqrt{\int_\Bbb R (x-\mu\lowersub{X})^2\,f\lowersub{X}(x)\operatorname d x}$$
Likewise, the angle between two $n$-dimensional vectors, $\vec x, \vec y$ is related to their inner product, and norms, by the cosine rule.
$$\cos \alpha\lowersub{\vec x,\vec y} = \frac{\langle \vec x, \vec y\rangle}{{\lVert \vec x\rVert}_2{\lVert \vec y\rVert}_2}$$
Similarly the co-variance, of two centered random variables, is analogous to an inner product, and so we have the concept of correlation as the cosine of an angle.
$$\rho\lowersub{X,Y}=\dfrac{\mathsf E((X-\mu\lowersub{X})(Y-\mu\lowersub{Y}))}{\sqrt{\mathsf E((X-\mu\lowersub{X})^2)\,\mathsf E((Y-\mu\lowersub{Y})^2)}}=\dfrac{\langle X-\mu\lowersub{X} , Y-\mu\lowersub{Y}\rangle}{\lVert X-\mu\lowersub{X}\rVert\,\lVert Y-\mu\lowersub{Y}\rVert} = \cos \theta\lowersub{X,Y}$$
As already noted in the comments, the concepts of cosine similarity and correlation are different. In particular, as explained below, the cosine of the angle between two vectors can be considered equivalent to the correlation coefficient only if the random variables have zero means. This explains why two orthogonal vectors, whose cosine similarity is zero, can show some correlation, and then a covariance different from zero as in the example of the OP.
Cosine similarity is obtained by taking the inner product and dividing it by the vectors’ $L2$ norms. The formula is
$${\displaystyle CS(x,y) ={\frac {\sum \limits _{i=1}^{n}{x_{i}x_{i}}}{{\sqrt {\sum \limits _{i=1}^{n}{x_{i}^{2}}}}{\sqrt {\sum \limits _{i=1}^{n}{y_{i}^{2}}}}}}= {\langle x,y \rangle \over \| x \|\|{y} \|} }$$
and corresponds to the cosine of the angle between the two vectors.
Cosine similarity is bounded between $-1$ and $1$. However, in most applications where this measure is used, the vectors are non-negative, so in these cases it ranges between $0$ and $1$. Importantly, cosine similarity is invariant to scaling (i.e. multiplying all terms by a nonzero constant) but is not invariant to shifts (i.e. adding a constant to all terms).
On the other hand, correlation can be seen as the cosine similarity measured between the centered versions of the two vectors. In fact, indicating with $\overline{x}$ and $\overline{y}$ the means, we have
$${\displaystyle r(x,y) ={\frac {\sum \limits _{i=1}^{n}({x_{i}-\overline{x})(y_{i}- \overline{y} ) }}{{\sqrt {\sum \limits _{i=1}^{n}{ (x_{i}-\overline{x}) ^{2}}}}{\sqrt {\sum \limits _{i=1}^{n}{
(y_{i}-\overline{y})^{2}}}}}}} = {\langle x-\overline{x}, \,y -\overline{y}\rangle \over \| x-\overline{x} \|\|{y}-\overline{y} \|} $$
and then
$$r(x,y)=CS(x-\overline{x}, \,y -\overline{y})$$
It is worthy of note that correlation is bounded between $-1$ and $1$ as well, but unlike cosine similarity it is invariant to both scaling and shifts.
We conclude that the cosine similarity is equal to the correlation coefficient only when the vectors $x$ and $y$ are centered (i.e., they have zero means).
Best Answer
This is almost correct. To give such a geometric interpretation one needs to proceed exactly as you did and define two things:
The interpretation for 1. is just the standard interpretation of functions as vectors. I.e. the random variables map the state space to $\mathbb{R}$ hence they are vectors such as every other real function. In your case the state space is finite hence the vector space is finite dimensional. You can identify it with $\mathbb{R}^3$ exactly as you suggested but you do not incorporate the probabilities! I.e. your random variable $X$ relates to the vector $(X(\omega_1), X(\omega_2), X(\omega_3)).$
The probabilities enter only for 2: Observe that the expectation of the product of zero mean random variables $\mathbb{E}[XY]$ fulfills all conditions of a scalar product this is bilinear, symmetric (pretty obviously) and nondegenerate since $\mathbb{E}[X^2]=0 \implies X=0$ a.e.
So you simply define $<X,Y>=\mathbb{E}[XY]$ and are ready to measure angles!