Solved – Covariance and correlation in multivariate random variables

correlationcovariance

I have this experiment where there are two random vectors $P_1 = (x_1,y_1)$ and $P_2 = (x_2,y_2)$. These two vectors represents two measurements for the location of two nearby points ($10$ meters apart) using two independent sensors. I want to calculate the covariance matrix and correlation coefficient between $P1$ and $P2$ using $10,000$ measurements I have. I know how do this for two random variables but not in the case of two random vectors.

Best Answer

Let $\mathbf{x}$ be a random column vector. In matrix notation, the covariance matrix for $\mathbf{x}$ can be expressed as:

$$ \Sigma = E\left[\left( \mathbf{x} - E[\mathbf{x}]\right) \left(\mathbf{x} - E[\mathbf{x}]\right)' \right] $$

The sample analogue is: $$ \hat{\Sigma} = \frac{1}{n-1} \sum_i \left( \mathbf{x}_i - \hat{\boldsymbol{\mu}}\right) \left(\mathbf{x}_i - \hat{\boldsymbol{\mu}}\right)' \quad \quad \hat{\boldsymbol{\mu}} = \frac{1}{n} \sum_i \mathbf{x}_i $$ where each $\mathbf{x}_i$ is a column vector containing the $i$th observation.

Something standard is to put your $n$ observations in an $n$ by $k$ data matrix $X$ where each row is an observation. That's standard convention in statistical texts and something similar is standard practice in many programming environments.

$$ X = \left[ \begin{array}{c} \mathbf{x}_1' \\ \mathbf{x}_2' \\ \ldots \\ \mathbf{x}_n' \\ \end{array} \right] $$

Various operations can be expressed quite elegantly with matrix notation using the data matrix $X$. The sample covariance matrix can be written as

$$(X - \hat{\boldsymbol{\mu}}')'(X - \hat{\boldsymbol{\mu}}') / (n - 1)$$

where $X - \hat{\mathbf{u}}'$ means you subtract the row vector $\hat{\mathbf{u}}'$ from each row of $X$.

(Note: bold letters are vectors, upper case are matrices, lower case are scalars, and $'$ means taking the transpose.)

Matlab comment:

In Matlab, you can easily follow the formulas exactly: make a data matrix $X$, compute $\hat{\boldsymbol{\mu}}$, and compute $\hat{\Sigma}$. There are also built-in functions, mean and cov respectively, which will do it for you.

Related Question