Solved – How to compute mean vector and covariance matrix of equal distributions

machine learningnormal distributionprobabilityself-study

This question is an extended version of this one.

As you can see here, two distributions are equal, I need to compute the parameters a,b,c,d and e. Could you show me a way to do that?


Assume a two-class problem with equal a priori class probabilities and Gaussian class-conditional densities as follows:

$$p(x\mid w_1) = {\cal N}\left(\begin{bmatrix} 0 \\ 0 \end{bmatrix},\begin{bmatrix} a & c \\ c & b \end{bmatrix}\right)\quad\text{and}\quad p(x\mid w_2) =
{\cal N}\left(\begin{bmatrix} d \\ e \end{bmatrix},\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}\right)$$

where $ab-c^2=1$.

Best Answer

Disclaimer: The answer below responded to the original version of the OP's question, which was quite different in nature and less specific than the current version.


$p(x∣w_1)$ and $p(x∣w2)$ are equal.

OK, this is going to take a lot longer to answer.

In some statistical applications, a statistician (or a machine, since you included machine learning as a tag) needs to decide which of two hypotheses is true: $H_1 \colon w = w_1$ and $H_2 \colon w = w_2$. It is known that $$P(w = w_1) = P(w = w_2) = \frac{1}{2}.$$ This is what the equal a priori probabilities that you keep referring to means.

Here is a simple method: Always decide that $w = w_1$, and so hypothesis $H_1$ is always the true hypothesis. When in fact $H_1$ is true, your decision is perfectly correct; when in fact $H_2$ is true your decision is perfectly wrong, and thus you have a $50\%$ chance of making an error. More sophisticated methods use a coin toss or a call to a random number generator to decide, but unfortunately still have a $50\%$ chance of making an an error; the same as the simpler mulish insistence that $H_1$ is always true.

To get better performance, i.e., smaller error probability), the statistician might observe a random variable whose distribution depends on the value of $w$. If $w = w_1$, the distribution is $p(x\mid w_1)$; if $w = w_2$, the distribution is $p(x\mid w_2)$. For example, if $w = w_1$, $x$ is a normal random variable with mean $100$ and variance $1$, while if $w = w_2$, $x$ is a standard normal random variable with mean $0$ and variance $1$. So if the statistician observes that $x$ has value $101.2$, it is highly likely that $w = w_1$ and thus very likely that $H_1$ is true because a standard normal random variable is quite unlikely to have large value. On the other hand, if $x$ has small value (say between $-4$ and $+4$), then it is quite likely that $H_2$ is true and $w = w_2$. But notice that all this depends critically on the distributions $p(x\mid w_1)$ and $p(x\mid w_2)$ being different. If the distributions are the same, then observing $x$ is of no help in deciding between $H_1$ and $H_2$. Thus when you claim that

$p(x∣w_1)$ and $p(x∣w_2)$ are equal

you are effectively insisting that observing $x$ is useless as far as deciding between $H_1$ and $H_2$ is concerned.

So, how are these distributions known in the first place? The client might provide them to the statistician based on the knowledge of how the client's apparatus works. Your professor, like Professor Indiana Jones in the movie Raiders of the Lost Ark, might be making them up as he goes along (Remember that $99\frac{44}{100}\%$ of all statistics are made up!). In the context of machine learning, there may be training samples provided: Here are $200$ observations of $x$ when $H_1$ is true, and here are $200$ more when $H_2$ is true. (In your particular problem, $x$ is a bivariate normal random variable with independent (standard normal) components when $H_1$ is true and correlated normal components when $H_2$ is true, and so each sample would be a a pair of numbers). The machine estimates $p(x\mid w_1)$ from the first set of observations and $p(x\mid w_2)$ from the second set, and uses these estimates when making decisions when the real work comes along.

In summary, your claim that $p(x\mid w_1) = p(x\mid w_2)$ means that $x$ is totally useless in distinguishing the two cases. For your particular distribution, equality holds (if you nevertheless contiunue to insist on equality) exactly when $a=b=1$ and $c=d=e=0$ (in which case $ab-c^2 = 1$ as desired). There is no way of solving for $a,b,c,d,e$, or saying what values of $a,b,c,d,e$ make sense in your problem based on the information that you have provided. You need to be given these by your professor, or you need to be given training data so that you can estimate these parameters, or you should emulate Professor Jones and make up some numbers (subject to the constraints that $ab - c^2 = 1$ and $a, b > 0$) and solve the problem using these.