Solved – Optimal algorithm for solving n-armed bandit problems

machine learningmultiarmed-banditreinforcement learning

I've read about a number of algorithms for solving n-armed bandit problems like $\epsilon$-greedy, softmax, and UCB1, but I'm having some trouble sorting through what approach is best for minimizing regret.

Is there a known optimal algorithm for solving the n-armed bandit problem? Is there a choice of algorithm that seems to perform best in practice?

Best Answer

Here are two survey papers I have found recently. I have not read them yet, but the abstracts sound promising.

Joann`s Vermorel and Mehryar Mohri: Multi-Armed Bandit Algorithms and Empirical Evaluation (2005)

From the abstract:

The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials. Many real-world learning and optimization problems can be modeled in this way. Several strategies or algorithms have been proposed as a solution to this problem in the last two decades, but, to our knowledge, there has been no common evaluation of these algorithms.

Volodymyr Kuleshov and Doina Precup: Algorithms for the multi-armed bandit problem (2000) From the abstract:

Secondly, the performance of most algorithms varies dramatically with the parameters of the bandit problem. Our study identiﬁes for each algorithm the settings where it performs well, and the settings where it performs poorly.

Best Answer

Related Solutions

Solved – Best bandit algorithm

Solved – Multi-armed bandit algorithms in Java

Related Question