[Math] Optimization with box constraints – via nonlinear function

convex optimizationconvex-analysisnonlinear optimizationnumerical methodsoptimization

I have the following convex optimization problem over $\mathbb{R}^n$ with box constraints:

$$\begin{align}\text{minimize }&\;f(x)\\
\text{subject to }&\;x \in [-1,1]^n\end{align}$$

I can see two approaches to handle this:

Approach 1. Use a dedicated convex optimization method that handles box constraints (e.g., projected gradient descent; L-BFGS-B, i.e., L-BFGS with box constraints).

Approach 2. Apply a nonlinear transform to force $x$ to be in the box $[-1,1]^n$. In particular, define

$$g(w) = f(\varphi(w))$$

where $\varphi: \mathbb{R}^n \to \mathbb{R}^n$ applies the hyperbolic tangent coordinate-wise, i.e., $\varphi(w)_i = \tanh w_i$. Then, solve the unconstrained optimization problem

$$\text{minimize }\;g(w)$$

where now we allow $w$ to range over all of $\mathbb{R}^n$. Note that the solution $w$ to this unconstrained problem immediately yields a solution to the original problem, by taking $x = \varphi(w)$. We can then use any optimization procedure to solve the unconstrained problem — we're not limited to ones that explicitly support box constraints. Basically, I'm just applying a substitution or change-of-variables to force the box constraints to always hold.

My question. Is there any reason to prefer one approach over the other? Should we expect one to converge significantly faster, or be more robust, or something?

In the readings I've done so far, I've seen lots of references to Approach 1, but I've never seen anyone mention Approach 2. Are there any pitfalls with Approach 2 that I should be aware of?

I have a convex optimization solver that seems to work well in my domain, but doesn't support box constraints. So, if there are no pitfalls or shortcomings of Approach 2, it would be convenient to apply the solver I already have to solve the unconstrained optimization problem. (The existing solver already has many well-tuned heuristics; it would be painful to implement projected gradient descent or L-BFGS from scratch and implement all of those heuristics.) However, I want to find out if there are some pitfalls I might not be aware of with Approach 2.

Approach 2 seems vaguely reminiscent of barrier methods (or interior point methods), though it's not the same. Does the theory in that area offer any relevant insights into this question?

Best Answer

A solution to your original problem probably has some components equal to $\pm 1$, but hyperbolic tangent is never equal to $\pm 1$. Thus, the reformulated problem probably has no minimizer. Also, $g $ is probably not convex, so convexity has been lost.

You mentioned the projected gradient method. You might also consider FISTA or the TFOCS software package.

Related Question