I mentioned in the comments that the problem looks like quadratic programming. And indeed it is. Here are the details. (There is a lot of computation, so there might be mistakes.)
For convenience, put $y = Ax$, which is a random vector in $\mathbb{R}^n$. You are trying to minimize $\text{Var}(y_1) + \dotsb + \text{Var}(y_n)$. Let's express this objective in terms of what's known.
We have a constraint that $\mathbb{E}[Ax] = b$. The covariance matrix of $y$ is then $\Sigma = \mathbb{E}[(y - b)(y - b)^T]$, and $\text{Var}(y_1) + \dotsb + \text{Var}(y_n)$ is exactly the sum of diagonals of $\Sigma$, i.e., $\text{tr}(\Sigma)$.
Now let's simplify this even further.
\begin{align*}
(y - b)(y - b)^T & = yy^T - b y^T - yb^T + bb^T\\
& = Axx^T A^T - b x^T A^T - Ax b^T + bb^T
\end{align*}
The covariance matrix is
\begin{align*}
\Sigma & = \mathbb{E}[(y - b)(y - b)^T]\\
& = \mathbb{E}[Axx^T A^T] - \mathbb{E}[bx^T A^T] - \mathbb{E}[Axb^T] + bb^T
\end{align*}
and
\begin{align*}
\text{tr}(\Sigma) & = \text{tr}(\mathbb{E}[Axx^T A^T]) - \text{tr}(\mathbb{E}[bx^T A^T]) - \text{tr}(\mathbb{E}[Axb^T]) + \text{tr}(bb^T)
\end{align*}
For the first term, we use two tricks: (1) the trace commutes with expectation; (2) the trace is cyclically invariant. We get
\begin{align*}
\text{tr}(\mathbb{E}[Axx^T A^T]) & = \mathbb{E}[\text{tr}(Axx^T A^T)]\\
& = \mathbb{E}[\text{tr}(A^T A xx^T)]\\
& = \text{tr}(\mathbb{E}[A^T A xx^T])\\
& = \text{tr}(\mathbb{E}[A^T A] xx^T)
\end{align*}
Since you know the mean and variance of $A$, $\mathbb{E}[A^T A]$ can be computed from this information. Let's put $M = \mathbb{E}[A^T A]$. Note that $M$ is a positive semidefinite matrix of size $n \times n$. Continuing, we have
\begin{align*}
\text{tr}(\mathbb{E}[Axx^T A^T]) & = \text{tr}(M xx^T)\\
& = \text{tr}(x^T M x)\\
& = x^T M x
\end{align*}
That takes care of the first term.
For the second term, put $F = \mathbb{E}[A] \in \mathbb{R}^{m \times n}$, which is known. We get
\begin{align*}
- \text{tr}(\mathbb{E}[b x^T A^T]) & = - \text{tr}(b x^T F^T)\\
& = - \text{tr}(x^T F^T b)\\
& = - x^T F^T b
\end{align*}
Do the same for the third term.
So in the end, our objective $\text{Var}(y_1) + \dotsb + \text{Var}(y_n)$ simplifies to
$$ x^T M x - 2 x^T F^T b + \text{tr}(bb^T) $$
This is just a quadratic expression in $x$. Moreover, since $M$ is positive semidefinite, this is a convex quadratic expression.
The whole problem boils down to the following quadratic program:
\begin{align*}
\underset{x \in \mathbb{R}^n}{\text{minimize}} & \quad x^T M x - 2x^T F^T b + \text{tr}(bb^T)\\
\text{subject to} & \quad F x = b\\
& \quad x \geq 0,
\end{align*}
where $M = \mathbb{E}[A^T A] \in \mathbb{R}^{n \times n}$, $F = \mathbb{E}[A] \in \mathbb{R}^{m \times n}$. Pretty much any convex optimization package can handle it. (For example, check out CVXPY.)
Update: As fedja pointed out in the comment, the problem simplifies further. Since $Fx = b$, the term $-2x^T F^T b$ is just $-2b^T b$, which is a constant. For optimization purposes, we can drop it, as well as the last term $\text{tr}(bb^T)$ from the objective. In the end, we just have
\begin{align*}
\underset{x \in \mathbb{R}^n}{\text{minimize}} & \quad x^T M x\\
\text{subject to} & \quad F x = b\\
& \quad x \geq 0,
\end{align*}
It's still a quadratic program.
Best Answer
After some more research, I think that I have come up with a solution to the problem. First, take the derivative with respect to $B_k$, and set it to 0.
$$ \frac{\partial}{\partial B_{k}}(||X_k - B_kD_kA^T||^2 + \mu_k||B_k-P_kB^*||^2) = 0 $$
Understanding that the sum of squared residuals can also be described as the square of the Frobenius Norm of the difference between $X_k$ and $B_kD_kA^T$, or $B_k$ and $P_kB^*$ via:
$$ ||A||_F = \sqrt{\sum_{i=1}^m\sum_{j=1}^n|a_{ij}|^2} = \sqrt{tr(A^TA)} = \sqrt{tr(AA^T)} $$
We can rearrange the original equation, adding an arbitrary constant, $\frac{1}{2}$, to aid with simplification later on.
$$ \frac{\partial}{\partial B_{k}}\frac{1}{2}tr((X_k - B_kD_kA^T)(X_k - B_kD_kA^T)^T) + \frac{\partial}{\partial B_{k}}\frac{1}{2}\mu_k(tr((B_k - P_kB^*)(B_k-P_kB^*)^T) = 0 $$
Expanding the equations, where $(X_k - B_kD_kA^T)^T = X_k^T - AD_kB_k^T$ and $(B_k - P_kB^*)^T = B_k^T - B^{*T}P_k^T$:
$$ \frac{\partial}{\partial B_k}\frac{1}{2}(tr(X_kX_k^T) -tr(X_kAD_kB_k^T) - tr(B_kD_kA^TX_k^T) + tr(B_kD_kA^TAD_kB_k^T)) +\frac{\partial}{\partial B_k}\frac{1}{2}\mu_k(tr(B_kB_k^T)-tr(B_kB^{*T}P_k^T) - tr(P_kB^*B_k^T) + tr(P_kB^*B^{*T}P_k^T)) = 0 $$
This equation can be simplified by using some convenient identities from the Matrix Cookbook: $$ \frac{\partial}{\partial{B_k}}tr(X_kX_k^T) = 0 $$
$$ \frac{\partial}{\partial{B_k}}tr(X_kAD_kB_k^T) = X_kAD_k $$
$$ \frac{\partial}{\partial{B_k}}tr(B_kD_kA^T) = X_kAD_k $$
$$ \frac{\partial}{\partial{B_k}}tr(B_kD_kA^T) = X_kAD_k $$
$$ \frac{\partial}{\partial{B_k}}tr(B_kD_kA^TAD_kB_k^T) = 2B_kD_kA^TAD_k $$
$$ \frac{\partial}{\partial{B_k}}tr(B_kB^{*T}P_k^T) = PkB^* $$
$$ \frac{\partial}{\partial{B_k}}tr(P_kB^*B_k^T) = P_kB^* $$
$$ \frac{\partial}{\partial{B_k}}tr(P_kB^*B^{*T}P_k^T) = 0 $$
$$ \frac{\partial}{\partial{B_k}}tr(B_kB_k^T) = 2B_k $$
Substituting into the previous equation yields:
$$ -X_kAD_k + B_kD_kA^TAD_k + \mu_kB_k - \mu_kP_kB^* = 0 $$
Multiplying by the inverse of $D_kA^TAD_k$:
$$ -X_kAD_k(D_kA^TAD_k)^{-1} + B_k + \mu_kB_k(D_kA^TAD_k)^{-1} - \mu_kP_kB^*(D_kA^TAD_k)^{-1} = 0 $$
Solving for $B_k$:
$$ -X_kAD_k + \mu_kB_k - \mu_kP_kB^* = -B_k(D_kA^TAD_k) $$
$$ -X_kAD_k - \mu_kP_kB^* = -B_k(D_kA^TAD_k) - \mu_kB_k $$
Yields the final form of the equation: $$ B_k = \frac{X_kAD_k + \mu_kP_kB^*}{D_kA^TAD_k + \mu_kI_R} $$
Although the notation used differs from the notation used in the paper (i.e. $B_k$ is described as belonging to the first mode and not the second as was described in the paper), the solution is identical to the solution found in the published code.