Least-Squares Solution that Minimizes Two Expressions

least squareslinear algebranumerical linear algebranumerical optimization

I am trying to solve for $B_k$ in the following equation, as the function of two separate least-squares expressions:

$$
B_k = \arg \min ||X_k – B_kD_kA^T||^2 + \mu_k||B_k – P_kB^*||^2
$$

This is part of a larger algorithm for non-negative PARAFAC2 that I am trying to implement, and is applicable for 3rd order tensors. However, for the purposes of this expression we are only considering one slice of the tensor.

I understand that to solve for $B_k$, we would set $X_k = B_kD_kA^T$ and take the right inverse of $D_kA^T$ to isolate $B_k$. However, I am unsure of how to set up the expression such that $B_k$ is the least-squares solution to both of the expressions. It must be solved using non-negative least squares, as per the article.

Link to the paper

Best Answer

After some more research, I think that I have come up with a solution to the problem. First, take the derivative with respect to $B_k$, and set it to 0.

$$ \frac{\partial}{\partial B_{k}}(||X_k - B_kD_kA^T||^2 + \mu_k||B_k-P_kB^*||^2) = 0 $$

Understanding that the sum of squared residuals can also be described as the square of the Frobenius Norm of the difference between $X_k$ and $B_kD_kA^T$, or $B_k$ and $P_kB^*$ via:

$$ ||A||_F = \sqrt{\sum_{i=1}^m\sum_{j=1}^n|a_{ij}|^2} = \sqrt{tr(A^TA)} = \sqrt{tr(AA^T)} $$

We can rearrange the original equation, adding an arbitrary constant, $\frac{1}{2}$, to aid with simplification later on.

$$ \frac{\partial}{\partial B_{k}}\frac{1}{2}tr((X_k - B_kD_kA^T)(X_k - B_kD_kA^T)^T) + \frac{\partial}{\partial B_{k}}\frac{1}{2}\mu_k(tr((B_k - P_kB^*)(B_k-P_kB^*)^T) = 0 $$

Expanding the equations, where $(X_k - B_kD_kA^T)^T = X_k^T - AD_kB_k^T$ and $(B_k - P_kB^*)^T = B_k^T - B^{*T}P_k^T$:

$$ \frac{\partial}{\partial B_k}\frac{1}{2}(tr(X_kX_k^T) -tr(X_kAD_kB_k^T) - tr(B_kD_kA^TX_k^T) + tr(B_kD_kA^TAD_kB_k^T)) +\frac{\partial}{\partial B_k}\frac{1}{2}\mu_k(tr(B_kB_k^T)-tr(B_kB^{*T}P_k^T) - tr(P_kB^*B_k^T) + tr(P_kB^*B^{*T}P_k^T)) = 0 $$

This equation can be simplified by using some convenient identities from the Matrix Cookbook: $$ \frac{\partial}{\partial{B_k}}tr(X_kX_k^T) = 0 $$

$$ \frac{\partial}{\partial{B_k}}tr(X_kAD_kB_k^T) = X_kAD_k $$

$$ \frac{\partial}{\partial{B_k}}tr(B_kD_kA^T) = X_kAD_k $$

$$ \frac{\partial}{\partial{B_k}}tr(B_kD_kA^T) = X_kAD_k $$

$$ \frac{\partial}{\partial{B_k}}tr(B_kD_kA^TAD_kB_k^T) = 2B_kD_kA^TAD_k $$

$$ \frac{\partial}{\partial{B_k}}tr(B_kB^{*T}P_k^T) = PkB^* $$

$$ \frac{\partial}{\partial{B_k}}tr(P_kB^*B_k^T) = P_kB^* $$

$$ \frac{\partial}{\partial{B_k}}tr(P_kB^*B^{*T}P_k^T) = 0 $$

$$ \frac{\partial}{\partial{B_k}}tr(B_kB_k^T) = 2B_k $$

Substituting into the previous equation yields:

$$ -X_kAD_k + B_kD_kA^TAD_k + \mu_kB_k - \mu_kP_kB^* = 0 $$

Multiplying by the inverse of $D_kA^TAD_k$:

$$ -X_kAD_k(D_kA^TAD_k)^{-1} + B_k + \mu_kB_k(D_kA^TAD_k)^{-1} - \mu_kP_kB^*(D_kA^TAD_k)^{-1} = 0 $$

Solving for $B_k$:

$$ -X_kAD_k + \mu_kB_k - \mu_kP_kB^* = -B_k(D_kA^TAD_k) $$

$$ -X_kAD_k - \mu_kP_kB^* = -B_k(D_kA^TAD_k) - \mu_kB_k $$

Yields the final form of the equation: $$ B_k = \frac{X_kAD_k + \mu_kP_kB^*}{D_kA^TAD_k + \mu_kI_R} $$

Although the notation used differs from the notation used in the paper (i.e. $B_k$ is described as belonging to the first mode and not the second as was described in the paper), the solution is identical to the solution found in the published code.

Related Question