[Math] Using max/min operators in linear programming.

linear programmingmarkov-process

I'm currently implementing a Markov Decision Process using the solver GLPK, I'm following the lecture by Vincent Conitzer, and there is a step I don't understand between the theoretical problem and its implementation.

More precisely, I have a set of states $\Sigma$ and a set of actions $A$, such that given two states $s$ and $s'$, $P(s, a, s')$ denotes the probability of going from the state $s$ to the state $s'$ with the action $a$. Moreover, for each state $s$ and each action $a$, $R(s, a)$ denotes the immediate reward of executing the action $a$ from the state $s$.

A policy is a function $\delta : \Sigma \to A$, which decides which action to perform in each state, and the value of a policy $\delta$ for a given state $s$ is given by

$$
V^\delta(s) = R(s, \delta(s)) + \beta \sum_{s'} P(s, \delta(s), s') \cdot V^\delta(s')
$$
where $\beta$ is a discount factor between 0 and 1. The goal is to find to optimal policy $\delta^*$ which maximises the value for each state. In other words, the problem we are trying to solve is to calculate the function $V^*$ (corresponding to the value of $\delta^*$ such that:
$$
V^*(s) = \max_{a} \,\,[R(s, a) + \beta \sum_{s'} P(s, a, s') \cdot V^*(s')]
$$
Up to this point, I have no problem, but then when I look at how to solve this using linear programming, I find the following (PDF, page 5):

As in
the game theory example above, we are confronted with the difficulty that we cannot use the max
operator in a linear program. The solution is similar. We use the following linear program:

minimize $\sum_{s} V^*(s)$

subject to $\forall s, a \,\, V^*(s) \geq R(s, a) + \beta \sum_{s'} P(s, a, s') \cdot V^*(s') $

The argument about game theory is the following one (page 6 of the previous PDF):

There is an interesting trick in this linear program. We would like to write
$u = max_i \sum^n_{j=1} p_j u_1(i, j)$, but this is not linear because of the max operator. So, instead, the linear program requires that $u$ is at least $\sum^n_{j=1} p_j u_1(i, j)$ for every i, thereby forcing it to be at least $max_i \sum^n_{j=1} p_j u_1(i, j)$.
Then, in the optimal solution, u will be set exactly to $max_i \sum^n_{j=1} p_j u_1(i, j)$,
because we try to minimize u.

My first problem is that I don't see why we using the max operator make the program non linear, and secondly, although this solution works, I am not sure to exactly understand why.

Best Answer

For your first question, consider the function $z = \max(x_1,x_2)$. Fix $x_2=1$ and let $x_1$ range over $[0,2]$. Then it is easy to see that $z$ is non-linear, for it is constant for $x_1 \in [0,1]$, but increasing for $x_2 \in [1,2]$.

As to getting around this problem, when you introduce the dummy variable $u$ with the conditions $u \geq \mathbf p_j^T \cdot \mathbf u \;$, notice that these are the only conditions in which $u$ appears. In other words, if $u$ is greater than all such linear combinations, there is nothing preventing it from being smaller. Therefore, in any optimal solution, $u$ must take on the value of one of the $\mathbf p_j^T \cdot \mathbf u \;$ (specifically, the one with the greatest value). Think of $u$ as a ceiling that drops down and settles on the highest point.

Related Question