Okay, so I have a $n$-dimensional normal distribution with the means and co-variance matrix defined (for now we can assume that these are the true distribution parameters, not estimates). For a given data point I want to calculate the probability that this point belongs to this distribution.
I believe I would be interested in the probability of generating a point "at least as unlikely" as the given data point. In a $1D$ normal distribution case this would be the area under the "two tails" of the PDF. E.g. $1-(CDF(x)-CDF(\mu-x))$. Now I want to calculate this probability for a general $n$-dimensional normal distribution. I believe maybe this should be $1$ – "volume inside the hyper-sphere in Mahalanobis distance space defined by the radius given by the Mahalanobis distance of the data point from the mean".
Question 1: Is this interpretation correct?
Question 2: How do I calculate this?
Question 3: Does the analysis change if the mean and covariance matrix are only estimates of the true parameters?
Question 4: Is there an easy way to do this is python?
My best attempt at solving this question myself (but my statistics knowledge isn't very good, so I am asking for correctness confirmation, it is very important I get this correct)
According to wikipedia:
$$(x-\mu)^T\Sigma^{-1}(x-\mu) \le \chi_k^2(p)$$
So I can first calculate the Mahalanobis distance as above (MD), and then maybe I just have to calculate the CDF of the chi-squared distribution at MD, and take $1$ minus this.
I have no idea if this is correct, but currently my best guess.
Thanks in advance.
Best Answer
Yeah, that sounds right. If you have parameters $\mu$ and $\Sigma$ and data point $x$, then the set of all data points that are less likely than $x$ are the ones that have less density, or in other words higher Mahalanobis distance: \begin{align*} (2\pi)^{-k/2}|\Sigma|^{-1/2}e^{\left[-\frac{1}{2}(y-\mu)^T\Sigma^{-1}(y-\mu) \right]} &\le (2\pi)^{-k/2}|\Sigma|^{-1/2}e^{\left[-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu) \right]} \iff\\ e^{\left[-\frac{1}{2}(y-\mu)^T\Sigma^{-1}(y-\mu) \right]} &\le e^{\left[-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu) \right]} \iff \\ (y-\mu)^T\Sigma^{-1}(y-\mu) &\ge (x-\mu)^T\Sigma^{-1}(x-\mu). \end{align*}
So you want the probability of some unseen $Y$ being "better" than your observed $x$, which is the same as $Q = (Y-\mu)^T\Sigma^{-1}(Y-\mu)$ being bigger than $(x-\mu)^T\Sigma^{-1}(x-\mu)$. And you are correct in pointing out that $Q \sim \chi^2(k)$. So $$ \mathbb{P}\left[(y-\mu)^T\Sigma^{-1}(y-\mu) \ge (x-\mu)^T\Sigma^{-1}(x-\mu)\right] = 1- \mathbb{P}\left[Q \le (x-\mu)^T\Sigma^{-1}(x-\mu)\right]. $$
In python you can calculate that wth something like this: