I've read about a number of algorithms for solving n-armed bandit problems like $\epsilon$-greedy, softmax, and UCB1, but I'm having some trouble sorting through what approach is best for minimizing regret.
Is there a known optimal algorithm for solving the n-armed bandit problem? Is there a choice of algorithm that seems to perform best in practice?
Best Answer
Here are two survey papers I have found recently. I have not read them yet, but the abstracts sound promising.
Joann`s Vermorel and Mehryar Mohri: Multi-Armed Bandit Algorithms and Empirical Evaluation (2005)
From the abstract:
Volodymyr Kuleshov and Doina Precup: Algorithms for the multi-armed bandit problem (2000) From the abstract: