Solved – Optimal algorithm for solving n-armed bandit problems

machine learningmultiarmed-banditreinforcement learning

I've read about a number of algorithms for solving n-armed bandit problems like $\epsilon$-greedy, softmax, and UCB1, but I'm having some trouble sorting through what approach is best for minimizing regret.

Is there a known optimal algorithm for solving the n-armed bandit problem? Is there a choice of algorithm that seems to perform best in practice?

Best Answer

Here are two survey papers I have found recently. I have not read them yet, but the abstracts sound promising.

Joann`s Vermorel and Mehryar Mohri: Multi-Armed Bandit Algorithms and Empirical Evaluation (2005)

From the abstract:

The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials. Many real-world learning and optimization problems can be modeled in this way. Several strategies or algorithms have been proposed as a solution to this problem in the last two decades, but, to our knowledge, there has been no common evaluation of these algorithms.

Volodymyr Kuleshov and Doina Precup: Algorithms for the multi-armed bandit problem (2000) From the abstract:

Secondly, the performance of most algorithms varies dramatically with the parameters of the bandit problem. Our study identiļ¬es for each algorithm the settings where it performs well, and the settings where it performs poorly.