I read from my textbook that $\text{cov}(X,Y)=0$ does not guarantee X and Y are independent. But if they are independent, their covariance must be 0. I could not think of any proper example yet; could someone provide one?
Independence vs Covariance – Understanding Key Differences
covarianceindependence
Related Solutions
"Covariance" is used in many distinct senses. It can be
a property of a bivariate population,
a property of a bivariate distribution,
a property of a paired dataset, or
an estimator of (1) or (2) based on a sample.
Because any finite collection of ordered pairs $((x_1,y_1), \ldots, (x_n,y_n))$ can be considered an instance of any one of these four things--a population, a distribution, a dataset, or a sample--multiple interpretations of "covariance" are possible. They are not the same. Thus, some non-mathematical information is needed in order to determine in any case what "covariance" means.
In light of this, let's revisit three statements made in the two referenced posts:
If $u,v$ are random vectors, then $\operatorname{Cov}(u,v)$ is the matrix of elements $\operatorname{Cov}(u_i,v_j).$
This is complicated, because $(u,v)$ can be viewed in two equivalent ways. The context implies $u$ and $v$ are vectors in the same $n$-dimensional real vector space and each is written $u=(u_1,u_2,\ldots,u_n)$, etc. Thus "$(u,v)$" denotes a bivariate distribution (of vectors), as in (2) above, but it can also be considered a collection of pairs $(u_1,v_1), (u_2,v_2), \ldots, (u_n,v_n)$, giving it the structure of a paired dataset, as in (3) above. However, its elements are random variables, not numbers. Regardless, these two points of view allow us to interpret "$\operatorname{Cov}$" ambiguously: would it be
$$\operatorname{Cov}(u,v) = \frac{1}{n}\left(\sum_{i=1}^n u_i v_i\right) - \left(\frac{1}{n}\sum_{i=1}^n u_i\right)\left(\frac{1}{n}\sum_{i-1}^n v_i\right),\tag{1}$$
which (as a function of the random variables $u$ and $v$) is a random variable, or would it be the matrix
$$\left(\operatorname{Cov}(u,v)\right)_{ij} = \operatorname{Cov}(u_i,v_j) = \mathbb{E}(u_i v_j) - \mathbb{E}(u_i)\mathbb{E}(v_j),\tag{2}$$
which is an $n\times n$ matrix of numbers? Only the context in which such an ambiguous expression appears can tell us which is meant, but the latter may be more common than the former.
If $u,v$ are not random vectors, then $\operatorname{Cov}(u,v)$ is the scalar $\Sigma u_i v_i$.
Maybe. This assertion understands $u$ and $v$ in the sense of a population or dataset and assumes the averages of the $u_i$ and $v_i$ in that dataset are both zero. More generally, for such a dataset, their covariance would be given by formula $(1)$ above.
Another nuance is that in many circumstances $(u,v)$ represent a sample of a bivariate population or distribution. That is, they are considered not as an ordered pair of vectors but as a dataset $(u_1,v_1), (u_2,v_2), \ldots, (u_n,v_n)$ wherein each $(u_i,v_i)$ is an independent realization of a common random variable $(U,V)$. Then, it is likely that "covariance" would refer to an estimate of $\operatorname{Cov}(U,V)$, such as
$$\operatorname{Cov}(u,v) = \frac{1}{n-1}\left(\sum_{i=1}^n u_i v_i - \frac{1}{n}\left(\sum_{i=1}^n u_i\right)\left(\sum_{i-1}^n v_i\right)\right).$$
This is the fourth sense of "covariance."
If two vectors are not random, then their covariance is zero.
This is an unusual interpretation. It must be thinking of "covariance" in the sense of formula $(2)$ above,
$$\left(\operatorname{Cov}(u,v)\right)_{ij} = \operatorname{Cov}(u_i,v_j) = 0$$
Each $u_i$ and $v_j$ is considered, in effect, a random variable that happens to be a constant.
In a regression context (where vectors, numbers, and random variables all occur together) some of these distinctions are further elaborated in the thread on variance and covariance in the context of deterministic values.
Your confusion arises from the fact that there are two different populations on the same multidimensional space. To clarify, let's play with a concrete example.
We have two populations $\mathcal{A}$ (people in Argentina) and $\mathcal{B}$ (people in Brazil). Each is described using the two same features $X,Y$ ($X$ - height, $Y$ - weight).
Now, in general $Cov_{\mathcal{A}}\left(X,Y\right) \neq Cov_{\mathcal{B}}\left(X,Y\right)$. That is, the relationship between height and weight in Argentina might be different than the relationship in Brazil. This case is what the instructor tried to emphasize. However, in the original question, we assume equality instead.
You should note that the covariance matrix for Argentina in our case is the following symmetric positive definite matrix: $$ \Sigma_{\mathcal{A}} = \begin{pmatrix} Var_{\mathcal{A}}\left(X\right) & Cov_{\mathcal{A}}\left(X,Y\right) \\ Cov_{\mathcal{A}}\left(Y,X\right) & Var_{\mathcal{A}}\left(Y\right)\end{pmatrix} $$ Finally, it doesn't make much sense to talk about the covariance between population $\mathcal{A}$ and population $\mathcal{B}$. Covariance can be calculated only between random variables taken from the same multivariate distribution.
Best Answer
Easy example: Let $X$ be a random variable that is $-1$ or $+1$ with probability 0.5. Then let $Y$ be a random variable such that $Y=0$ if $X=-1$, and $Y$ is randomly $-1$ or $+1$ with probability 0.5 if $X=1$.
Clearly $X$ and $Y$ are highly dependent (since knowing $Y$ allows me to perfectly know $X$), but their covariance is zero: They both have zero mean, and
$$\eqalign{ \mathbb{E}[XY] &=&(-1) &\cdot &0 &\cdot &P(X=-1) \\ &+& 1 &\cdot &1 &\cdot &P(X=1,Y=1) \\ &+& 1 &\cdot &(-1)&\cdot &P(X=1,Y=-1) \\ &=&0. }$$
Or more generally, take any distribution $P(X)$ and any $P(Y|X)$ such that $P(Y=a|X) = P(Y=-a|X)$ for all $X$ (i.e., a joint distribution that is symmetric around the $x$ axis), and you will always have zero covariance. But you will have non-independence whenever $P(Y|X) \neq P(Y)$; i.e., the conditionals are not all equal to the marginal. Or ditto for symmetry around the $y$ axis.