Let $X\sim \mathcal{N}(\mu,\sigma^2)$ be a Gaussian random variable with random mean $\mu\sim {\sf Bernoulli}(p)$, i.e., $\mu=1$ with probability $p$ and $\mu=0$ with probability $1-p$. In other words, to draw a realization of $X$, first one flips a coin to decide whether $\mu=0$ or $\mu=1$ and then, one samples from the Gaussian distribution with chosen mean.
Consider the family $\mathcal{M}$ of Bernoulli random variables taking on the values $0$ or $1$.
Let $\mathcal{I}\left(Y,X\right)$ be the mutual information between $Y$ and $X$ (a natural definition can be found, e.g., in this mathoverflow post).
My intuition tells that a global solution for the optimization problem
$$\max_{Y\in\mathcal{M}} \mathcal{I}(Y,X)$$
is given by $\mu$ itself, i.e.,
$$\mathcal{I}(\mu,X) = \max_{Y\in\mathcal{M}} \mathcal{I}(Y,X).$$
Any reference or ideas on how to prove or disprove this claim would be appreciated.
Best Answer
$\newcommand{\de}{\delta}\newcommand{\M}{\mathcal M} \newcommand{\ep}{\varepsilon} \newcommand{\thh}{\theta}\newcommand\I{\mathcal I}\newcommand{\Si}{\Sigma}$Here is the (slightly edited) definition of the mutual information given in the answer linked in the OP:
Let us now answer the current question.
Much more generally than in the OP, let $X$ be any r.v. as in the highlighted text above. Let $\nu$ denote the distribution of $X$.
Let $\M$ denote the set of all Bernoulli r.v.'s $Y$ (defined on the same probability space as $X$) with $P(Y=1)\in(0,1)$; we suppose that the probability space is rich enough so that for any coupling $\gamma$ of any Bernoulli distribution and the distribution $\nu$ of $X$ there is a r.v. $Y$ such that the joint distribution of the pair $(Y,X)$ is $\gamma$.
In view of the highlighted definition, for any $Y\in\M$, \begin{equation*} \I(Y,X)=\int_S d\nu\; \Big(\rho\ln\frac{\rho}{p}+(1-\rho)\ln\frac{1-\rho}{1-p}\Big), \end{equation*} where $p:=P(Y=1)$, $\rho:=\frac{d\nu_1}{d\nu}$, and $\nu_1(B):=P(Y=1,X\in B)$ for all $B\in\Si$; we are assuming the convention that $0\times\text{anything}:=0$.
We want to maximize $\I(Y,X)$ over all $Y\in\M$. This is very easy to do, noting that $u\ln\frac{u}{p}+(1-u)\ln\frac{1-u}{1-p}$ is convex in $u\in[0,1]$. So, $\I(Y,X)$ is maximized when $\rho=1_A$ for some $A\in\Si$, and then $p=\nu(A)$ and $\I(Y,X)=\int_A d\nu\; \ln\frac1{p}=p\ln\frac1p$. Maximizing the latter expression in $p\in(0,1)$, we see that, contrary to the conjecture in the OP,
\begin{equation} \max_{Y\in\M}\I(Y,X)=\frac1e, \end{equation} for any r.v. $X$ whatsoever provided that $X$ is defined on a rich enough probability space.