Solved – Multi-armed bandit algorithms in Java

javamultiarmed-bandit

Multi-armed_bandit problem defenition from Wikipeda:

"In probability theory, the multi-armed bandit problem (sometimes called the K-[1] or N-armed bandit problem) is the problem a gambler faces at a row of slot machines, sometimes known as "one-armed bandits", when deciding which machines to play, how many times to play each machine and in which order to play them. When played, each machine provides a random reward from a distribution specific to that machine. The objective of the gambler is to maximize the sum of rewards earned through a sequence of lever pulls" (http://en.wikipedia.org/wiki/Multi-armed_bandit)

In other words Multi-armed bandit (MAB) problems are a class of sequential resource allocation problems concerned with allocating one or more resources among several alternative (competing) projects / 'arms'. Such problems have a conflict between making decisions (allocating resources) that yield high current rewards, versus making decisions that sacrifice current gains with the prospect of better future rewards.

Any Java implementations for these algorithms?

Best Answer

You're only describing a problem. There are many different algorithms that can be applied to solve it, see e.g. http://lane.compbio.cmu.edu/courses/slides_ucb.pdf.

The UCB1 algorithm is so dead simple, reading Java code to understand how it works is probably a bad idea - unless you want to learn Java:

Play machine $j$, that maximizes

$$\bar{x}_j+\sqrt{\frac{2\ln n}{n_j}},$$ where $\bar{x}_j$ is the average reward obtained from machine $j$, $n_j$ is how often $j$ has been played and $n$ is the number of total plays. The 2 in the formula is a constant which can be changed to tune the exploration vs exploitation of the algorithm (higher number results in more exploration).

Related Question