[Math] Optimization problem involving step function

optimization

I've got to optimize the following function with respect to $\phi$:

$q(\phi, x) = \frac{1}{n} \sum_{i=1}^{n}{H(y_i)}$

where

$y_i = k – \phi l – x_i$

and $H(.)$ denotes the Heaviside function. $k$ and $l$ are constants, and $x$ follows either (1) a continuous uniform distribution or (2) a normal distribution. This is part of a quite standard programming problem but I'm a little stuck with finding the optimal $\phi$

I'm sure this is a totally simple question but I can't quite figure it out… any help is greatly appreciated…

Best Answer

The step functions all add up in the same direction, since $\phi$ has the same sign in all $y_i$ – thus $q$ is minimal for all $\phi$ such that all step functions are $0$, which occurs for $\phi\lessgtr(k-x)/l$, where the inequality is $\lt$ or $\gt$ and $x$ is the greatest or least of the $x_i$, depending on whether $l$ is negative or positive, respectively.

Related Solutions

Fixing a minimal number of variables in linear programming problem to worsen objective

This is an instance of a bilevel program, as you can see it as an optimization problem where one of the constraints is that a variable is optimal in some other optimization problem. It can be represented as a MILP via KKT conditions.

Here is a test in the MATLAB Toolbox YALMIP https://yalmip.github.io/tutorial/bilevelprogramming/

A = randn(10,7);
b = 10*rand(10,1);
c = randn(7,1);
x = sdpvar(7,1);
optimize([0<=x<=1, A*x <= b],-c'*x)

z = value(c'*x);
y = binvar(7,1);
OuterC = [c'*x <= z - 0.1];
OuterO = sum(y);
InnerC = [0<=x<=1, x <= 1-y, A*x <= b]
InnerO = -c'*x;
solvebilevel(OuterC,OuterO,InnerC,InnerO,x)
value(x)
value(y)

This example uses the built-in naive bilevel solver. A typically better solution is to ask it to use an external MILP solver instead or manually use the kkt operator to define the MILP and then call the MILP solver

https://yalmip.github.io/example/bilevelprogrammingalternatives/

https://yalmip.github.io/command/kkt/

Compute the dual of an optimization problem defined on a function space

So, you want to find $L^*(\lambda) = \inf_\alpha L(\alpha,\lambda)$.

short version: There are two relevant cases. First, if $$ \left\lvert\sum_{i=1}^n \lambda_i y_i \phi(u^Tx_i)\right\rvert> 1 \forall u\in A\subset\mathbb S^{d-1} $$ holds for some measurable set $A$ with nonzero measure, then one can show that $L^*(\lambda)=-\infty$.

In the second case, where $$ \left\lvert\sum_{i=1}^n \lambda_i y_i \phi(u^Tx_i)\right\rvert\le 1 \forall u\in \mathbb S^{d-1} $$ holds, one can show that $\alpha=0$ is a minimizer of $L(\cdot,\lambda)$.

elaborations: We define the function $w(u):=\sum_{i=1}^n \lambda_i y_i \phi(u^Tx_i)$. Then one can show $$ L(\alpha,\lambda) =\sum_{i=1}^n \lambda_i + \|\alpha\|_1 - \langle \alpha ,w\rangle $$ holds. In the first case, we can without loss of generality assume that $w\geq 1 + \delta$ on a measurable set $B$ with nonzero measure and some $\delta>0$ (for $w<-1$ the proof would work similar). We choose $\alpha_k\in \mathcal L^1$ such that $\alpha_k(u)=k$ on $B$ and $0$ otherwise. Then we have $$ L(\alpha_k,\lambda) =\sum_{i=1}^n \lambda_i + \int_{\Bbb S^{d-1}} |\alpha_k(u)| - \alpha_k(u) w(u)\,\mathrm du =\sum_{i=1}^n \lambda_i + \int_B |\alpha_k(u)| - \alpha_k(u) w(u)\,\mathrm du \leq \sum_{i=1}^n \lambda_i + \int_B k - k (1+\delta)\,\mathrm du $$ which converges to $-\infty$ if $k\to\infty$. Thus we have $L^*(\lambda)=-\infty$, which means that these $\lambda$ are not feasible in the dual problem.

In the second case, we have $|w(u)|\leq1$ for all $u$, and therefore $|\alpha(u)|-\alpha(u) w(u)\geq0$ for all $u$. This implies $$ L(\alpha,\lambda) =\sum_{i=1}^n \lambda_i + \int_{\Bbb S^{d-1}} |\alpha(u)| - \alpha(u) w(u)\,\mathrm du \geq \sum_{i=1}^n \lambda_i = L(0,\lambda) $$ and therefore $L^*(\lambda)= \sum_{i=1}^n \lambda_i$.

remarks regarding context: Claim B.3 in the paper is used to show Lemma B.1. However, I think Lemma B.1 is false (if I am understanding the notation correctly): If $\operatorname{supp}(\alpha^*)$ is finite, then $\alpha^*$ would only be nonzero on finitely many points, and $\langle \alpha^*,\varphi(x_i)\rangle=0$ would hold for all $i$ (since $\langle\cdot,\cdot\rangle$ represents an integral). Thus, the optimal value would always be $0$ and $\alpha^*=0$ would always be the best possible solution to (3.6). However, this should not be true for all data $x_i,y_i,\varphi$.

Best Answer

Related Solutions

Fixing a minimal number of variables in linear programming problem to worsen objective

Compute the dual of an optimization problem defined on a function space

Related Question