[Math] Which optimization algorithm converges faster

everyone. I'm having a large scale unconstrained optimization problem. If I treat the unconstrained problem as a constrained problem with infinity constraints, I should be able to use both the fminunc and fmincon function in MATLAB. When using the fminunc function, I should provide the gradient and the sparse pattern of the Hessian. While using the fmincon function, I can choose the l-bfgs method to approximate the Hessian.

So, here is my question. Which one converges faster? Are there any good reasons to choose one over another? Or any principles to guide such choice?

Update:
Actually the problem is a least-squares problem. However the number of terms is big, so the Jacobian would be too large to store. This is the reason why I don't use the lsqnonlin function. The algorithm used in fminunc for large scale problem is a trust-region method (details can be found in fminunc documentation), and the algorithm in fmincon is l-bfgs (see fmincon documentation). What I'm asking is more of a generative comparison because there are many C/C++ implementations of these algorithms. Answers may not be necessarily constrained by MATLAB. Thanks!

Update 2:
I was unclear, the form of my problem is like this:
$$
\min_W (\sum_i\|Y_i-WX_i\|_2^2).
$$

Update 3:
Thank you all! littleO is right. But I'm still curious about the efficiency of the methods mentioned above under general condition.

Best Answer

If your problem is truly a least squares problem: \begin{equation} \text{minimize}_x \quad \|Ax - b\|_2^2 \end{equation} then you should probably use a technique from numerical linear algebra, rather than some general purpose optimization algorithm. Least squares problems have been studied extensively in numerical linear algebra and very good algorithms / libraries exist for least squares problems.

In Matlab, you can solve least squares problems just using the backslash operator:

x = A\b ;

$A$ can be stored as a sparse matrix in Matlab if it's huge.

Edit: Your specific problem can be expressed as a least squares problem. First note that \begin{align} \|Y_i - W X_i\|_F &= \| Y_i^T - X_i^T W^T \|_F \\ &= \| B^i - A^i Z \|_F \end{align} where I have changed notation in the last line. (Note that $W^T = Z$.)

Furthermore, \begin{align} \| B^i - A^i Z \|_F &= \left \| \begin{bmatrix} B^i_1 - A^i Z_1 \\ \vdots \\ B^i_N - A^i Z_N \end{bmatrix} \right \|_2 \end{align} where $B^i_1,\ldots,B^i_N$ are the columns of $B^i$ and $Z_1,\ldots,Z_N$ are the columns of $Z$. Let \begin{equation} \tilde{B}^i = \begin{bmatrix} B^i_1 \\ \vdots \\ B^i_N \end{bmatrix} \end{equation} and \begin{equation} \tilde{A}^i = \begin{bmatrix} A^i & & \\ & \ddots & \\ & & A^i \end{bmatrix} \end{equation} and \begin{equation} \tilde{Z} = \begin{bmatrix} Z_1 \\ \vdots \\ Z_N \end{bmatrix}. \end{equation} Then \begin{equation} \|B^i - A^i Z \|_F = \|\tilde{A}^i \tilde{Z} - \tilde{B}^i \|_2. \end{equation} Note that $\tilde{Z}$ and $\tilde{B}^i$ are column vectors while $\tilde{A}^i$ is a matrix, so we are close to expressing our problem as a standard least squares problem.

Next note that \begin{align} \sum_{i=1}^K \|Y_i - W X_i \|_F^2 &= \sum_{i=1}^K \|\tilde{A}^i \tilde{Z} - \tilde{B}^i \|_2^2. \end{align} Let \begin{equation} A = \begin{bmatrix} \tilde{A}^1 & & \\ & \ddots & \\ & & \tilde{A}^K \end{bmatrix} \end{equation} and \begin{equation} b = \begin{bmatrix} \tilde{B}^1 \\ \vdots \\ \tilde{B}^K \end{bmatrix}. \end{equation} We finally see that your problem is equivalent to \begin{equation} \text{minimize}_{\tilde{Z}} \quad \|A \tilde{Z} - b \|_2^2. \end{equation}

An analytical solution is given by \begin{equation} \tilde{Z} = (A^T A)^{-1} A^T b. \end{equation} ($\tilde{Z}$ satisfies the normal equations.) However, in practice you should probably compute $\tilde{Z}$ using a least squares solver. In Matlab, it's just

Ztilde = A\b ;

Best Answer

Related Solutions

[Math] Which optimization method when Hessian is singular

Related Question