How to understand the meaning of the image in Artificial Intelligence book in chapter 5?
Min max algorithm in Artificial Intelligence
artificial intelligence
Related Solutions
For applications of Differential Geometry in Computer Science, the following link is very useful: http://stat.fsu.edu/~anuj/CVPR_Tutorial/ShortCourse.htm
It talks about : "Differential Geometric Methods for Shape Analysis and Activity Recognition"
One of the contributors of the above is Dr.Anuj Srivastava (http://stat.fsu.edu/~anuj/index.php). I liked the following line from his research areas page- "This technology is needed in many applications -- medical diagnosis, biometrics, video surveillance, undersea imaging, terrain mapping, and satellite image analysis."
Hope it helps !
Many models used in machine learning are either continuous optimization problems (linear / logistic regression, simple neural networks, SVMs) or pieces of it are continuous (more complicated neural networks, regression trees, etc). A huge part of convex optimization is learning foundational methods to solve continuous and convex problems.
One common complaint is that many ML problems may be nonconvex (neural networks being the main one). However, many of the current tools used to solve nonconvex problems are either the same or small variants of the traditional tools presented for continuous convex problems, such as
- gradient descent, from which we get stochastic gradient descent, i.e. ML's favorite algorithm
- block coordinate descent or minimization
- acceleration schemes for first order methods (most famously, Nesterov's acceleration and Polyak's heavy ball)
In fact, methods like stochastic gradient descent, and even asynchronous versions (which are all the craze in ML today) appeared as early as Bertsekas' Nonlinear Programming book in the 1990s, which was primarily aimed at convex optimization. And, methods like ADMM were proposed and analyzed in the 70s. I don't see them appearing too much in industrial ML, but in theoretical ML they are favored for their ease in decomposing large problems and exploiting sparsity. A key concept you will get from Boyd/Vandenberghe that you might not get from ML blogs is concept from convex optimization that is relevant in ML is duality and dual forms, mostly because of SVMs. (For kernel SVMs, we always solve the dual form.)
Therefore, learning about convex optimization, even though it is restricted to convex problems, can be extremely helpful and foundational in understanding the newer, "hotter" methods of today.
That being said...
There are several key differences between learning ML and learning optimization (convex or otherwise), and I think it is important to be aware of them, so you don't put too much emphasis on skills that my have less relevance to ML. To me, they are
- Optimization folk care about training error. ML folk care about generalization error. In practice, what this means is that ML folk are completely comfortable with getting within a neighborhood of the true minimizer, since that often doesn't affect test error (and may even act as implicit regularization).
- Optimization folk tend to prefer simpler methods with better optimality properties. ML folk tend to prefer models that better fit the data structure, which may be highly generalizable but not necessarily "simple" from an optimization point of view. (e.g. decision trees)
- Most "foundational" optimization textbooks (like Boyd/Vandenberghe) came out of operations research (OR), which largely involves relaxations of combinatorial problems. Most machine learning motivations do not intersect with operations research very much.
- Many great optimization papers involve deriving rates, properties, and guarantees that do not match practice very well. I believe this is in part because of the previous point. (They do much better on operations research problems.)
- A specific artifact of the OR history is that it often seems convex optimization is very concerned with constraints and sensitivity analysis; in ML, this is often much less important in the downstream task.
- Another artifact of the OR history is the emphasis on smoothness. Many optimization methods get great convergence results by forcing the problem to be an unconstrained smooth problem. (The most famous of these are barrier methods, which turn into interior point methods, which is what default CVX mainly runs on.) In machine learning, we often don't care about this. Either we use a proximal operator and maintain the problem's original form, or we just ignore nonsmooth points altogether, and just converge to a neighborhood of the optimal variable without complaint. (Result of caring more about generalization error than training error.) The other thing this buys us is scalability, which is really the name of the game in today's data science world.
I'm guessing I wrote a lot of controversial things; I have worked on convex optimization for 7 years now and ML for the past 3 years. I suppose someone with a different background may have an entirely different perspective.
Best Answer
The minimax algorithm is a way to model an adversarial task between two agents, where one agent is trying to maximize a certain score and the other is trying to minimize it. A canonical example for this is chess, where if we have some measure of how good or bad a position is for white, then the player with white will want the position to be as good as possible and the player with black will want it to be as bad as possible.
The general idea is that if one player makes some move, then the other player will get a set of responses for each possible move the first player could make, and then the same goes for the first player, and so on. This is what the tree represents: each node is a position on a given player's turn, and each edge is a possible move. If it's my turn with the white pieces, I want to pick the move such that black will have the best possible worst outcome: that is to say, if we assume black always makes the move which makes the position as bad as possible, then I want to pick the one that will make that worst outcome as good as possible. It doesn't matter if there are better possible outcomes after different moves because black wouldn't let us go down those paths. Similarly, if I'm black, I want to make the move such that white has the worst possible best outcome.
So for this problem, we can start from the second row and look at the decisions each MIN node gets. B can choose between getting scores of $3, 12$, or $8$ so it will choose $3,$ C can choose between $2, 4,$ and $6$ so it will choose $2$, and D can choose between $2, 5$, and $14$ so it will also choose $2.$
Now, if we look at the MAX node A on top, it should see that it can choose between nodes B, C, and D, with worst outcomes $3, 2,$ and $2$. Because A wants to maximize its score, it will conclude that it should perform action $a_1$ with an expected score of $3.$
Hope this helps!