Integrate Gaussian pdf over Mahalanobis distance set of different Gaussian pdf

mahalanobis distancenormal distributionprobability distributions

Question: Let $f_X(\boldsymbol x|\boldsymbol\mu_a,S_a)$ be a $d$-dimensional Gaussian probability density function (pdf), let $\tilde{f}_X(\boldsymbol x|\boldsymbol\mu_b,S_b)$ be a different $d$-dimensional Gaussian pdf and let $R=\left\{\boldsymbol x| \sqrt{(\boldsymbol x-\boldsymbol \mu_b)S_b^{-1}(\boldsymbol x-\boldsymbol \mu_b)^T} \le \varepsilon\right\}$ be the set of points that have at most a Mahalanobis distance of $\varepsilon$ from $\boldsymbol{\mu}_b$. What is $$F_{X}(\boldsymbol x \in R)=\int_{\boldsymbol x\in R} f_X(\boldsymbol x| \boldsymbol \mu_a, S_a) d\boldsymbol x\quad?$$ In other words, what is the probability measure of the overlap between two Gaussians if we cut off one of them at a certain Mahalanobis distance?

Background: A similar term occurs in an unsolved non-convex optimization problem I am studying. I would guess that the solution is well-known and involves concepts like the non-central Chi-squared distribution and/or the Bhattacharyya coefficient.

What I have tried so far: For the simple case $f_X=\tilde{f}_X$ we get the well-known result $$F_X(\boldsymbol x\in R)=\tilde{F}_X(\boldsymbol x\in R)=\chi^2_d(\varepsilon^2)$$ where $\chi_d^2$ is the chi-squared distribution with $d$ degrees of freedom centered at $\sum \mu^2$. If $f_X\neq\tilde{f_X}$, then it seems to me that the solution would involve rescaling via an eigendecomposition of $S_a$ and $S_b$ to "adjust" $\chi^2_d$ accordingly, but I don't see how.
Also, I found this question which is similar but does not include $R$ and is restricted to the 1-dimensional case.

Best Answer

Not an answer, but too long to comment.

First, observe that if $\boldsymbol{X} \sim \mathcal{N}\left(\boldsymbol{0}, I\right)$ (where $I$ is the identity matrix), then $A \boldsymbol{X} + \boldsymbol{b} \sim \mathcal{N}\left(\boldsymbol{b}, A A^T\right)$, and vice versa, if $\boldsymbol{Y} \sim \mathcal{N}(\boldsymbol{\mu}, S)$, then $S^{-1/2} \left(\boldsymbol{Y} - \boldsymbol{\mu}\right) \sim \mathcal{N}\left(\boldsymbol{0}, I\right)$, where $S^{-1/2} = \left(\sqrt{S}\right)^{-1}$ is the inverse of matrix square root of $S$ (which is defined according to the general theory of function on symmetric matrices).

Notice that $$\left(\boldsymbol{x} - \boldsymbol{\mu}_b\right)^T S_b^{-1} \left(\boldsymbol{x} - \boldsymbol{\mu}_b\right) = \left(\boldsymbol{x} - \boldsymbol{\mu}_b\right)^T S_b^{-1/2} S_b^{-1/2} \left(\boldsymbol{x} - \boldsymbol{\mu}_b\right) = \\ = \left(\boldsymbol{x} - \boldsymbol{\mu}_b\right)^T \left(S_b^{-1/2}\right)^T S_b^{-1/2} \left(\boldsymbol{x} - \boldsymbol{\mu}_b\right) = \left\|S_b^{-1/2} \left(\boldsymbol{x} - \boldsymbol{\mu}_b\right)\right\|^2,$$ where $\left\| \;\cdot\;\right\|$ denotes Euclidean norm, so $R_{\varepsilon} = \left\{\boldsymbol{x} : \left\|S_b^{-1/2} \left(\boldsymbol{x} - \boldsymbol{\mu}_b\right)\right\| \leq \varepsilon\right\}$. Then for $\boldsymbol{X} \sim \mathcal{N}\left(\boldsymbol{\mu}_a, S_a\right)$ $$ \mathbb{P}\left(\boldsymbol{X} \in R_{\varepsilon}\right) = \mathbb{P}\left( \left\|S_b^{-1/2} \left(\boldsymbol{X} - \boldsymbol{\mu}_b\right)\right\| \leq \varepsilon\right) = \\ = \mathbb{P}\left( \left\|S_b^{-1/2} \left(S_a^{1/2} \boldsymbol{Y} + \boldsymbol{\mu}_a - \boldsymbol{\mu}_b\right)\right\| \leq \varepsilon\right) = \mathbb{P}\left( \left\|S_b^{-1/2} S_a^{1/2} \boldsymbol{Y} + S_b^{-1/2}\left(\boldsymbol{\mu}_a - \boldsymbol{\mu}_b\right)\right\| \leq \varepsilon\right), $$ where $\boldsymbol{Y} \sim \mathcal{N}\left(\boldsymbol{0}, I\right)$. Let's introduce new vector $\boldsymbol{W} \sim \mathcal{N}\left(\boldsymbol{\nu}, \Sigma\right)$, where $\boldsymbol{\nu} = S_b^{-1/2}\left(\boldsymbol{\mu}_a - \boldsymbol{\mu}_b\right)$ and $\Sigma = S_b^{-1/2} S_a^{1/2} \left(S_b^{-1/2} S_a^{1/2}\right)^T = S_b^{-1/2} S_a^{1/2} S_a^{1/2} S_b^{-1/2} = S_b^{-1/2} S_a S_b^{-1/2}$.

So, given problem is equivalent to calculating $\mathbb{P}\left(\left\|\boldsymbol{W} \right\| \leq \varepsilon\right)$ for some non-standard Gaussian vector. It gets tricky even for 1D Gaussian. It this case for $X \sim \mathcal{N} \left(\mu_a, s_a\right)$ and some $\mu_b, s_b > 0 \in \mathbb{R}$ we have $W \sim \mathcal{N} \left(\frac{1}{\sqrt{s_b}} (\mu_a - \mu_b), \frac{s_a}{s_b} \right)$, which leads to $$\mathbb{P}\left(|W| \leq \varepsilon\right) = F_W (\varepsilon) - F_W(-\varepsilon) = \\ = \Phi\left( \frac{\varepsilon - \frac{1}{\sqrt{s_b}} (\mu_a - \mu_b)}{\sqrt{\frac{s_a}{s_b}}}\right) - \Phi\left( \frac{-\varepsilon - \frac{1}{\sqrt{s_b}} (\mu_a - \mu_b)}{\sqrt{\frac{s_a}{s_b}}}\right) = \\ = \Phi \left(\frac{\varepsilon \sqrt{s_b} - (\mu_a - \mu_b)}{\sqrt{s_a}}\right) - \Phi \left(\frac{-\varepsilon \sqrt{s_b} - (\mu_a - \mu_b)}{\sqrt{s_a}}\right),$$ where $\Phi(x) = \frac{1}{\sqrt{2\pi}} \int_0^x e^{-t^2/2} \mathrm{d} t$. As one can see, this expression becomes much simpler in case $\mu_a = \mu_b$.

But as I mentioned in comments, I don't think it's possible to obtain some closed form solution in multidimensional case (however, if $\boldsymbol{\mu}_a = \boldsymbol{\mu}_b$, things may get a bit simpler).


EDIT: for $\boldsymbol{W} \sim \mathcal{N}\left(\boldsymbol{\nu}, \Sigma\right)$ we can write $ \left\|\boldsymbol{W} \right\|^2 = \left\|\Sigma^{1/2} \boldsymbol{Y} + \boldsymbol{\nu}\right\|^2 $ for $\boldsymbol{Y} \sim \mathcal{N}\left(\boldsymbol{0}, I\right)$. Let $\left<\cdot, \cdot \right>$ denote standard dot product, then $$ \left\|\boldsymbol{W} \right\|^2 = \left<\Sigma^{1/2} \boldsymbol{Y} + \boldsymbol{\nu}, \Sigma^{1/2} \boldsymbol{Y} + \boldsymbol{\nu}\right> = \left<\Sigma^{1/2} \boldsymbol{Y}, \Sigma^{1/2} \boldsymbol{Y} \right> + 2 \left<\Sigma^{1/2} \boldsymbol{Y}, \boldsymbol{\nu}\right> + \left<\boldsymbol{\nu}, \boldsymbol{\nu} \right> = \\ = \left<\Sigma \boldsymbol{Y}, \boldsymbol{Y} \right> + 2 \left<\Sigma^{1/2} \boldsymbol{Y}, \boldsymbol{\nu}\right> + \left\|\boldsymbol{\nu}\right\|^2 $$ As $\Sigma^{1/2}$ is symmetric PSD matrix, $\Sigma^{1/2} = Q \Lambda Q^T$ for some orthogonal $Q$ and diagonal $\Lambda$, and $\Sigma = Q \Lambda^2 Q^T.$ So, continuing previous equalities, $$ \left\|\boldsymbol{W} \right\|^2 = \left<Q \Lambda^2 Q^T \boldsymbol{Y}, \boldsymbol{Y} \right> + 2 \left<Q \Lambda Q^T \boldsymbol{Y}, \boldsymbol{\nu}\right> + \left\|\boldsymbol{\nu}\right\|^2 = \\ = \left<\Lambda^2 Q^T \boldsymbol{Y}, Q^T \boldsymbol{Y} \right> + 2 \left<\Lambda Q^T \boldsymbol{Y}, Q^T\boldsymbol{\nu}\right> + \left\|\boldsymbol{\nu}\right\|^2. $$ As standard Gaussian distribution is spherically symmetric, $Q^T \boldsymbol{Y}$ and $\boldsymbol{Y}$ have equal distributions ($Q^T \boldsymbol{Y} \overset{d}{=} \boldsymbol{Y}$), so $$ \left\|\boldsymbol{W} \right\|^2 \overset{d}{=} \left<\Lambda^2 \boldsymbol{Y}, \boldsymbol{Y} \right> + 2 \left<\Lambda \boldsymbol{Y}, Q^T\boldsymbol{\nu}\right> + \left\|\boldsymbol{\nu}\right\|^2 = \\ = \sum_{i} \lambda_i^2 \boldsymbol{Y}_i^2 + 2 \sum_{i} \lambda_i \left(Q^T\boldsymbol{\nu}\right)_i \boldsymbol{Y}_i + \left\|\boldsymbol{\nu}\right\|^2, $$ where $\lambda_i$ is the $i$-th diagonal element of $\Lambda$. Each $\boldsymbol{Y}_i^2$ follows $\chi^2_1$ distribution and $2 \sum_{i} \lambda_i \left(Q^T\boldsymbol{\nu}\right)_i \boldsymbol{Y}_i + \left\|\boldsymbol{\nu}\right\|^2$ is Gaussian with mean $\left\|\boldsymbol{\nu}\right\|^2$ and variance $4 \left\|\Lambda Q^T\boldsymbol{\nu}\right\|^2$.

EDIT: what I've written before about generalized chi-squared distribution is not applicable, since gaussian term is not independent from chi-squared ones.

So, $\left\|\boldsymbol{W} \right\|^2$ follows generalized chi-squared distribution $\tilde{\chi}^2\left(\boldsymbol{w}, \boldsymbol{k}, \boldsymbol{\lambda}, m, s\right)$ with $$ \boldsymbol{w}_i = \lambda_i^2, \; \boldsymbol{k}_i = 1, \; \boldsymbol{\lambda}_i = 0, \; m = \left\|\boldsymbol{\nu}\right\|^2, \; s = 4 \left\|\Lambda Q^T\boldsymbol{\nu}\right\|^2. $$

Related Question