"Covariance" is used in many distinct senses. It can be
a property of a bivariate population,
a property of a bivariate distribution,
a property of a paired dataset, or
an estimator of (1) or (2) based on a sample.
Because any finite collection of ordered pairs $((x_1,y_1), \ldots, (x_n,y_n))$ can be considered an instance of any one of these four things--a population, a distribution, a dataset, or a sample--multiple interpretations of "covariance" are possible. They are not the same. Thus, some non-mathematical information is needed in order to determine in any case what "covariance" means.
In light of this, let's revisit three statements made in the two referenced posts:
If $u,v$ are random vectors, then $\operatorname{Cov}(u,v)$ is the matrix of elements $\operatorname{Cov}(u_i,v_j).$
This is complicated, because $(u,v)$ can be viewed in two equivalent ways. The context implies $u$ and $v$ are vectors in the same $n$-dimensional real vector space and each is written $u=(u_1,u_2,\ldots,u_n)$, etc. Thus "$(u,v)$" denotes a bivariate distribution (of vectors), as in (2) above, but it can also be considered a collection of pairs $(u_1,v_1), (u_2,v_2), \ldots, (u_n,v_n)$, giving it the structure of a paired dataset, as in (3) above. However, its elements are random variables, not numbers. Regardless, these two points of view allow us to interpret "$\operatorname{Cov}$" ambiguously: would it be
$$\operatorname{Cov}(u,v) = \frac{1}{n}\left(\sum_{i=1}^n u_i v_i\right) - \left(\frac{1}{n}\sum_{i=1}^n u_i\right)\left(\frac{1}{n}\sum_{i-1}^n v_i\right),\tag{1}$$
which (as a function of the random variables $u$ and $v$) is a random variable, or would it be the matrix
$$\left(\operatorname{Cov}(u,v)\right)_{ij} = \operatorname{Cov}(u_i,v_j) = \mathbb{E}(u_i v_j) - \mathbb{E}(u_i)\mathbb{E}(v_j),\tag{2}$$
which is an $n\times n$ matrix of numbers? Only the context in which such an ambiguous expression appears can tell us which is meant, but the latter may be more common than the former.
If $u,v$ are not random vectors, then $\operatorname{Cov}(u,v)$ is the scalar $\Sigma u_i v_i$.
Maybe. This assertion understands $u$ and $v$ in the sense of a population or dataset and assumes the averages of the $u_i$ and $v_i$ in that dataset are both zero. More generally, for such a dataset, their covariance would be given by formula $(1)$ above.
Another nuance is that in many circumstances $(u,v)$ represent a sample of a bivariate population or distribution. That is, they are considered not as an ordered pair of vectors but as a dataset $(u_1,v_1), (u_2,v_2), \ldots, (u_n,v_n)$ wherein each $(u_i,v_i)$ is an independent realization of a common random variable $(U,V)$. Then, it is likely that "covariance" would refer to an estimate of $\operatorname{Cov}(U,V)$, such as
$$\operatorname{Cov}(u,v) = \frac{1}{n-1}\left(\sum_{i=1}^n u_i v_i - \frac{1}{n}\left(\sum_{i=1}^n u_i\right)\left(\sum_{i-1}^n v_i\right)\right).$$
This is the fourth sense of "covariance."
If two vectors are not random, then their covariance is zero.
This is an unusual interpretation. It must be thinking of "covariance" in the sense of formula $(2)$ above,
$$\left(\operatorname{Cov}(u,v)\right)_{ij} = \operatorname{Cov}(u_i,v_j) = 0$$
Each $u_i$ and $v_j$ is considered, in effect, a random variable that happens to be a constant.
In a regression context (where vectors, numbers, and random variables all occur together) some of these distinctions are further elaborated in the thread on variance and covariance in the context of deterministic values.
Here is a geometric interpretation.
First, take two vectors in $\mathbb{R}^2$
$$\vec{\mathbb{z}}=[x,y] \,, \vec{\mathbb{w}}=[u,v]$$
For these vectors, there are two standard types of "products", the dot product
$$\vec{\mathbb{z}}\dot{}\vec{\mathbb{w}}=xu+yv$$
and the cross product*
$$\vec{\mathbb{z}}\times\vec{\mathbb{w}}=xv-yu$$
which can be interpreted as
$$\vec{\mathbb{z}}\dot{}\vec{\mathbb{w}}_\perp$$
where $\vec{\mathbb{w}}_\perp=[v,-u]$ has the same magnitude as $\vec{\mathbb{w}}$ but is orthogonal.
(*Technically this "2D cross product" is defined as $[0,0,\vec{\mathbb{z}}\times\vec{\mathbb{w}}]\equiv[x,y,0]\times[u,v,0]$.)
In terms of geometric intuition, the dot product between two vectors measures how well they align (think correlation), but also their relative magnitudes (think standard deviations), i.e.
$$\vec{\mathbb{z}}\dot{}\vec{\mathbb{w}}=||\vec{\mathbb{z}}||\,||\vec{\mathbb{w}}||\,\cos[\theta]$$
where $\theta$ is the angle between them (compare to $\sigma_{xy}=\sigma_x\sigma_y\rho_{xy}$).
Note that the dot product can also be written as $\mathbb{z}^T\mathbb{w}$, where $\mathbb{z}$ and $\mathbb{w}$ are just $\vec{\mathbb{z}}$ and $\vec{\mathbb{z}}$ written as column vectors.
Now let us do the same thing with two scalars in the complex plane ($z,w\in\mathbb{C}$), i.e.
$$z=x+iy \,, w=u+iv$$
What is the equivalent to the "dot product" here? It is actually the same as above, but now using the conjugate transpose, i.e. $z^*\equiv\bar{z}^T$ (also written as $z^\dagger$).
Since the transpose of a scalar is just that same scalar, the complex dot product is then
$$z^{\dagger}w=\bar{z}w=(x-iy)(u+iv)=(xu+yv)+i(xv-yu)$$
We can immediately notice two things. First, the complex dot product is equivalent to
$$z^{\dagger}w=(\vec{\mathbb{z}}\dot{}\vec{\mathbb{w}})+i(\vec{\mathbb{z}}\dot{}\vec{\mathbb{w}}_\perp)$$
i.e. it is a complex number whose real component is the dot product of the corresponding 2-vectors, and whose imaginary component is their cross product. Second, since $\bar{x}=x$ for $x\in\mathbb{R}$, the vector dot product we started with can be written as $\vec{\mathbb{z}}\dot{}\vec{\mathbb{w}}=\mathbb{z}^{\dagger}\mathbb{w}$ (i.e. we were really using the conjugate transpose all along).
Now for the covariance matrix.
For simplicity, let us assume that all random variables have zero mean. Then the covariance is defined as
$$\mathrm{Cov}[z,w]\equiv\mathbb{E}[\bar{z}w]$$
so we have
\begin{align}
\mathrm{Cov}[z,w] &= \mathrm{Re}[\sigma_{z,w}]+i\,\mathrm{Im}[\sigma_{z,w}] \\
&= \,\mathbb{E}[\,\vec{\mathbb{z}}\dot{}\vec{\mathbb{w}}\,]+i\,\mathbb{E}[\,\vec{\mathbb{z}}\dot{}\vec{\mathbb{w}}_\perp]
\end{align}
The real (imaginary) component of $\sigma_{z,w}$ is the expected value of the dot (cross) product of the associated vectors $\vec{\mathbb{z}}$ and $\vec{\mathbb{w}}$.
This is the main intuition. The rest of this answer is just for completeness.
If our random variable is a column vector $\mathbb{z}=[z_1,\ldots,z_n]\in\mathbb{C}^n$ with covariance matrix $\boldsymbol{\Sigma}\in\mathbb{C}^{n\times n}$, then we have
$$\Sigma_{ij}=\mathrm{Cov}[z_i,z_j]$$
Finally, if we have $m$ samples of the random variable $\mathbb{z}$, arranged as the rows of a data matrix $\boldsymbol{Z}\in\mathbb{C}^{m\times n}$, then the sample* covariance can be approximated by
$$\boldsymbol{\Sigma}\approx\tfrac{1}{m}\boldsymbol{Z}^{\dagger}\boldsymbol{Z}$$
(*yes, I divided by $m$, so you can call it the "biased" sample covariance if you must.)
Best Answer
EDIT: So my actual answer is that it is difficult to visualize quaternions (and you should not feel alone in that), thus the conversion to Euler angles. The thesis is there in case you wanted the covariance laws, and to add to the rationale above.
Here is some information from a thesis (the full thesis PDF link is at the bottom)
You may also try locating this book: Vanicek, P. and E.J. Krakiwsky (1986): Geodesy: The Concepts, North-Holland, Amsterdam.
source: www.ucalgary.ca/engo_webdocs/GL/96.20096.JSchleppe.pdf