How can a computer algorithm optimize a discontinuous function

algorithmsoptimizationsoft-question

Let function $f : [0,1] \to \Bbb R$ be defined by

$$f (x) := \begin{cases} \left| x – \frac{1}{\pi} \right| & \text{ if } x \neq \frac{1}{\pi}\\ 10 & \text{ if } x = \frac{1}{\pi}\end{cases}$$

Can things like $$\max_{x \in [0,1]} f(x)$$ be solved by a computer algorithm? Is it possible to prove that no computer algorithm is capable of optimizing this type of functions?

Background: we are trying to optimize $\phi: \Delta \to \mathbb N$ where $\Delta$ stands for the collection of all probability measures. Of course any algorithm can only handle finite numbers so we have to cut $\Delta$ into steps. However, we cannot exclude the possibility that the optima lie within the steps.

I've looked into some literature on applying mollification or smooth approximation, but my $f(x)$ might just become a simple absolute value function after "mollification" because the point $1/\pi$ is measure zero and takes no weight.

Best Answer

It depends on what you mean by "optimize" or "solve". Essentially, there are (at least) two distinct notions of a computer accessing a function --- white box, and black box.

Black box access is somewhat more common (within theoretical computer science at least). You view the computer as having some "oracle" $\mathcal{O}(\cdot)$ which it can query. It has no idea of the inner workings of the oracle, but it can send the oracle inputs, and receive outputs back from the oracle. You can view this as having some API for some function, but having absolutely no clue how it is implemented.

White box access means that you give the computer "the code of the function" in some way. Maybe (theoretically) you give it $M$, the description of a Turing Machine (or some other model of computation) which computes the function. Maybe (practically) you give the computer access to the file in which the function is defined.

We'll discretize things to make things simpler (from a computation perspective). Let $f : D \to R$ be the function you want to optimize. Then, for an arbitrary discontinuous function $f$, black-box optimization takes at least $|D|$ queries to the oracle, i.e. you must check all points.

The proof is as follows. Say you have some optimization function which queries the oracle for the function $f$ at points $D' \subsetneq D$. Let $x\in D\setminus D'$ be a point which is not queried. Then the optimization function is incorrect on the function $f'$, which is the same as $f$ on $D'$, but has $f(x) = 1 + \max_{x\in D} f(x)$. Note that this (simplistic) proof stops working if your optimization function is randomized, but I believe that your general intuition still holds true that one cannot optimize arbitrary discontinuous functions.

In the white box model there are things you can do, but it is highly dependent on how the function is presented. Your function (for example) can be represented via a tree, where each internal node you compute some predicate of the input then branch left or right as a result (this is formalized via the notion of a decision tree). While I do not know the state of art for optimizing functions represented as decision trees, there's the following simple algorithm that shows you get "more power" than the black box query model --- each leaf of the tree corresponds to an output. Sort them. Then loop through the (sorted) leaves, and check if there exists an input which satisfies all predicates on the internal nodes on the path to that leaf. The first leaf for which there does exist such an input is the maximum value.

As for why this is "more power", technically the amount of power depends on the complexity of the underlying decision tree. But for your function things reduce to analyzing a binary tree of depth two (which is quite simple). How long the black-box model takes depends on how exactly you discretize things, but if you wanted to "query all floating point numbers" (for example) it suffices to say it would take a while.

Note that for more general notions of computation (say Turing machines) little is possible for the "standard reasons" (the Turing machines may not even halt). That being said, if you know that the function is "nice enough", you could imagine doing something similar to what I described with decision trees for the control flow graph of your function. Decision trees control flow graph is especially simple --- it's a tree (instead of a general graph), and there is no mutation of variables.

There may be thorny details when one steps away from this discretized world that I just discussed, but I (unfortunately) cannot comment on those.

Related Question