Merit Function Line Search in Interior Point Methods – Penalty Parameter Updates

algorithmscomputational mathematicsnumerical methodsnumerical optimizationoptimization

I gues the TL;DR is summarized as follows:
Why do the step length and penalty parameter calculations not consider the steadily decreasing step length (in a backtracking line search) when calculating predicted and actual reductions of a merit function in interior point methods.

A little bit more detail:
Following some of the Publications of the interior point solvers KNITRO and IPOPT, e.g.

a globalization strategy is implemented for the interior point problem
\begin{align}
\text{min}_{x,s}\ &\varphi_{\mu}(x,s) = f(x) – \mu \sum_i^p \log(s_i)\\
s.t.\ & g(x)=0\\
&h(x)+s=0
\end{align}

based on the merit function
$$ \phi_{\nu}(x,s;\mu)=\varphi_{\mu} + \nu\left|\left|\begin{bmatrix}g(x)\\ h(x)+s\end{bmatrix}\right|\right|.$$
In order to calculate acceptable step lengths $\alpha$ for a given search direction $d=[d_x,\,d_s]^\top$ the well-known fraction-to-the-boundary rule is applied so that we get a maximum step length $\alpha^{\text{max}}$.
Then, an Armijo-type line search is performed using a quadratic/linear model of the merit function, i.e.,
\begin{align}
m(d)=(\varphi_{\mu})_k + (\nabla\varphi_{\mu})_k^\top d + \frac{1}{2}d^\top W_kd + \nu\left|\left|\begin{bmatrix}g_k+(\nabla_x g)_k^\top d_x\\ h_k+(\nabla_x h)_k^\top d_x+s_k+d_s\end{bmatrix}\right|\right|,
\end{align}

where, e.g., $(\varphi_{\mu})_k=\varphi_{\mu}(x_k,s_k)$.
For this,

  • first the penalty parameter $\nu$ is supposed to be calculated based on the predicted reduction in the merit function $\text{pred}(d)=m(0)-m(d)$ such that the predicted reduction is somewhat proportional to the constraint violation, i.e.,
    $$\text{pred}(d)\geq \rho\nu\left|\left|\begin{bmatrix}g_k\\ h_k+s_k\end{bmatrix}\right|\right|,$$
    which gives
    \begin{align}\label{eq:penaltyParameterCondition}
    \nu\geq\frac{(\nabla\varphi_{\mu})_k^\top d + \frac{1}{2}d^\top W_kd}{(1-\rho)\left|\left|\begin{bmatrix}g_k\\ h_k+s_k\end{bmatrix}\right|\right| }
    \end{align}

    since, according to 1, $g_k+(\nabla_x g)_k^\top d_x=0$ and $h_k+(\nabla_x h)_k^\top d_x+d_s=0$.
    My first question is, why is $\alpha^{\text{max}}$ not considered for these calculations, i.e., why is it not $d:=\alpha^{\text{max}}d$ in the above equation?
    This is not taken into account in any of the above-mentioned references.
    However, in 2, they do not make the assumption that
    $g_k+(\nabla_x g)_k^\top d_x=0$ and $h_k+(\nabla_x h)_k^\top d_x+d_s=0$, which would then need to be included in the penalty parameter condiditon. Also, what happens when the norm of the constraints is zero to within the constraint tolerance? Then $\nu$ will tend to infinity?!
  • second, according to 3 from above, once $\nu$ is obtained, the step length $\alpha_l=2^{-l}\alpha^{\text{max}},\ l=0,…$ is decreased until the Armijo-type condition
    \begin{align}
    \text{ared}(\alpha_ld) \geq \eta \text{pred}(\alpha_ld)
    \end{align}

    is fulfilled, where $\text{ared}(\alpha_ld)=\phi_{\nu}(x_k,s_k;\mu)-\phi_{\nu}(x_k+\alpha_ld_x,s_k+\alpha_ld_s;\mu)$ is the actual reduction.
    This makes sense to me, since the actual and predicted reductions take into account the step length decrease.
    However, this is only stated explicitly in the third reference and in non of the others. Why would you ignore the step length in these calculations?

EDIT: added links to references

Best Answer

  1. The first question is reasonable to ask. Intuitively, it would make sense to consider the step size from the fraction-to-the-boundary rule when updating the penalty parameter. However, one would have to show that it can be carried through the rest of the analysis to be sure.

  2. Your statement about the penalty parameter is not complete. The stated formula for nu is only used when the predicted reduction condition is not satisfied for the most recent value of nu; otherwise, the value of nu is not changed. For example, if the current iterate is feasible, then the predicted reduction condition is satisfied as long as the predicted reduction is nonnegative, in which case nu would not be changed.

  3. I don't think I understand the last question. The backtracking line search always starts from the step size from the fraction-to-the-boundary rule. This is crucial for the convergence analysis.

Related Question