[Math] Calculating the mean and standard deviation of a Gaussian mixture model of two curves

game theorygaussiannormal distributionprobability distributions

An ELO rating is a Gaussian curve with a mean and a standard deviation. Assuming there are two such ratings that belong to the same player (he's using two separate online identities so he has two separate ratings) – How would I best merge the two curves into one curve representing the ELO of the persona?

Extending the question based on comments below:

The rating's mean is the approximate skill of the player, and the standard deviation is the level of confidence of the system in the skill approximation.

The suggested model is to use a Gaussian mixture model with some probability of picking each of the identities, and then calculate the mean and standard deviation of the resulting distribution. I know the mixed distribution is not Gaussian, but I need just two parameters, so this is what I am after.

In short

How do you calculate the mean and standard deviation of a Gaussian mixture model of two Gaussian curves ($\mu_1$, $\sigma_1$), ($\mu_2$, $\sigma_2$) with probability of p and (1-p) for each curve?

Best Answer

$\newcommand{\N}{\mathcal{N}}\newcommand{\Var}{\mathrm{Var}}\newcommand{\E}{\Bbb{E}}$Assume that the two personas are represented by distributions $X_1\sim \N\left(\mu_1, \sigma_1^2\right)$ and $X_2\sim \N\left(\mu_2, \sigma_2^2\right)$, where $\mu_k$ and $\sigma_k^2$ are the mean and variance respectively of $X_k$, for $k=1,2$. Assume that $X_1$ and $X_2$ are independent.

We can model the overall persona as coming from $X_1$ with some probability $p$, or coming from $X_2$ otherwise (with probability $1-p$).

That is, if $Z$ is the overall persona, then $Z = IX_1 + (1-I)X_2$, where $I$ is a random variable that is $1$ with probability $p$ and $0$ with probability $1-p$, and $I,X_1,X_2$ are independent.

In this case, $Z$ (the overall persona) is modelled as a Gaussian Mixture Model, with probability density function $f_Z(z) = pf_{X_{1}}(z)+(1-p)f_{X_{2}}(z)$, where $f_{X_{k}}$ is the probability density function of $X_k$, $k=1,2$.

If you just want the mean and variance of the overall persona $Z$ (to use for a Gaussian model), the formulas are:

$\Bbb{E}[Z] = p \mu_1 + (1-p)\mu_2$

and

$\Var(Z) = p\sigma_1^2 +(1-p)\sigma_2^2 + p(1-p)\left(\mu_1-\mu_2\right)^2.$


Some hints to proving the formulas for the mean and variance of $Z$ are to recall the following facts:

  1. $\E[Z] = \E[\E[Z\mid I]]$ by the Law of Total Expectation

  2. $\Var(Z) = \E[\Var(Z\mid I)] + \Var(\E[Z\mid I])$ by the Law of Total Variance

  3. If $Y$ is a random variable that takes value $a$ with probability $p$ and value $b$ with probability $1-p$ (where $a,b$ are constants), then $\E[Y] = pa+(1-p)b$ and $\Var(Y) = p(1-p)(a-b)^2$.