Equivalence of optimization problems via change of variables

change-of-variableoptimization

I have read the following paper: https://people.eecs.berkeley.edu/~elghaoui/Pubs/SparseLearningBoolean.pdf

I have a question about the proof for Theorem 1 on page 69. Consider
\begin{align}
\text{P}_1 := \min\limits_{w \in \mathbb{R}^d \\ \|w\|_0 \leq k}\left\{ \sum_{i=1}^nf(\langle x_i,w \rangle;y_i) + \frac{1}{2}\rho\|w\|_2^2 \right\}.
\end{align}

Now they make the change of variable $w \mapsto \text{diag}(u)w$ for a fixed $u \in \{0,1\}^d$ and claim that the above problem is equivalent to

\begin{align}\label{eq1}
\text{P}_2 := \min\limits_{u \in \{0,1\}^d \\ \sum_{j=1}^d u_j \leq k} \min\limits_{w \in \mathbb{R}^d }\left\{ \sum_{i=1}^nf(\langle \text{diag}(u) x_i,w \rangle;y_i) + \frac{1}{2}\rho\|w\|_2^2 \right\}.
\end{align}

I don't see, why this is true. How can I reconstruct the solution for $P_1$ from $P_2$?

Best Answer

Note that $$\langle diag(u)x_i,w \rangle = x_i^T diag(u) w = \langle x_i,diag(u)w \rangle.$$ This confirms that indeed there was a change of variable $w \mapsto \text{diag}(u)w$.

In $P_2$, if the optimal solution has $u_i=0$ then $w_i=0$ (because a nonzero value does not affect the first term in the objective function but increases the second term). So, the optimal solution of $P_2$ will satisfy $||w||_0 \leq k$.

Related Question