I have two finite-sampled signals, $x_1$ and $x_2$, and I want to check for statistical independence.
I know that for two statistically independent signals, their joint probability distribution is a product of the two marginal distributions.
I have been advised to use histograms in order to approximate the distributions. Here's a small example.
x1 = rand(1, 50);
x2 = randn(1, 50);
n1 = hist(x1);
n2 = hist(x2);
n3 = hist3([x1' x2']);
Since I am using the default number of bins, n1
and n2
are 10-element vectors, and n3
is a 10×10 matrix.
My question is this: How do I check whether n3
is in fact a product of n1
and n2
?
Do I use an outer product? And if I do, should I use x1'*x2
or x1*x2'
? And why?
Also, I have noticed that hist
returns the number of elements (frequency) of elements in each bin? Should this be normalized in any way? (I haven't exactly understood how hist3
works either..)
Thank you very much for your help. I'm really new to statistics so some explanatory answers would really help.
Best Answer
Assuming that the theoretical distributions of $x_1$ and $x_2$ are not known, a naive algorithm for determining independence would be as follows:
Define $x_{1,2}$ to be the set of all co-occurences of values from $x_1$ and $x_2$. For example, if $x_1 = { 1, 2, 2 }$ and $x_2 = { 3, 6, 5}$, the set of co-occurences would be $\{(1,3), (1, 6), (1, 5) , (2, 3), (2,6), (2,5), (2, 3), (2,6), (2,5))\}$.
A simple way to estimate a PDF from a sample is to compute the sample's histogram and then to normalize it so that the integral of the PDF sums to 1. Practically, that means that you have to divide the bin counts of the histogram by the factor $h * sum(n)$ where $h$ is the bin width and $n$ is the histogram vector.
Note that step 3 of this algorithm requires the user to specify a threshold for deciding whether the signals are independent.