Solved – Maximization of Output based on Input

extreme valuemachine learningneural networksoptimization

What I want to do is find the values for $X = $ { $x_j$ } that will produce the maximum $y$.

I'm currently trying to maximize my output $y$, based on my inputs $X$.

Say there are inputs, $X = $ { $x_1 , x_2 , x_3 \cdots , x_j $ }. Each example has all the features in $X$, and a $y$ value – ie one example is $ ( X_i, y_i) $

What I want to do is find the values for $X = $ { $x_j$ } that will produce the maximum $y$.

One (shitty) idea is create a neural network with $j$ input nodes in the first layer and have one single output node. Train the NN, which would let me predict the output $y$ based on $X$ which isn't helpful for my example. So I would then generate a bunch of randomized values for $X$ and find the value that outputs the largest $y$ which is obviously super inefficient.

Another (shitty) idea is to train the NN like above, but this time use some sort of Reinforcement learning to optimize the inputs. I don't know much about RL, but it seems to be unnecessary and not helpful in this situation because the training examples are varied which means the $y$ value wouldn't be optimized? (I don't know much about this specifically.)

Is there a specific model or algorithm that would let me find the maximum $y$ based on my $X$ values?

This question is pretty similar to this but here they don't actually give a specific model to follow.

NEW EDIT : As requested by @whuber a more clear explanation of the context. I am conducting an experiment that has many parameters I am able to vary. I can then develop a large(ish) dataset with each of the parameters/features somehow affecting my final output $y$. The inputs are my features/parameters $X =$ {$x_j$} and each example of $(X_i, y_i)$ contains a variation on the features called $X_i$ where $i$ is the number of training examples and $j$ is the number of features comprised in $X$. – I have no idea what the relationship between $X$ and $y$ so I don't know what $f$ maps from $X \mapsto y$

I want to know what I changes I can do to my parameters ($X =$ {$x_j$} ) in my experiment in order to maximize my output ($y$).

Best Answer

OP has answered this question by stating

I think I have found what I was looking for.

I can approximate my $ y = f(x_1, x_2, x_3, x_4, x_5, x_6) = g(x_1, x_2, x_3, x_4, x_5, x_6) + \varepsilon $ using an Artificial Neural Net, then from there using a genetic algorithm to optimize the output.

but I don't think that this is a good solution.

Using a neural network inherently introduces lots of hard problems: selecting how many neurons in what configuration, an initialization method, an activation function, an optimizer, a learning rate, regularization strategies (L1, L2, dropout, mixup, max norm...) -- to succeed, all of these things must be chosen to be "just right" for whatever problem you're trying to solve. Then the genetic algorithm step adds more complexity, and more tuning, on top of that.

If you generalize from using a neural network to any function approximation method, you are employing a surrogate surface, which you denote as $g$, and performing the optimization over that surface. There's no particular need to use a neural network. A popular method to use in place of a neural network is a Gaussian process. See: Jones et al (1998), "Efficient Global Optimization of Expensive Black-Box Functions."

One desirable attribute that you would want your surrogate surface interpolates well between your data points, neither changing values too quickly or too slowly. A well-studied category of surrogate surfaces for this type of problem is the Gaussian process; these methods are usually characterized in terms of a length-scale parameter which exactly controls the behavior between data points. In this sense, there is essentially only one "knob," the length-scale, that you have to fiddle with.

Once you're satisfied that you have a "good" surrogate surface, you can directly optimize it. The values $g$ and their gradients are easily computed and cheap to evaluate, so you can quite readily use multi-start optimization, or really any global optimization method, to find optima on the surface.