Context: I am a programmer with some (half-forgotten) experience in statistics from uni courses. Recently I stumbled upon http://akinator.com and spent some time trying to make it fail. And who wasn't? 🙂
I've decided to find out how it could work. After googling and reading related blog posts and adding some of my (limited) knowledge into resulting mix I come up with the following model (I'm sure that I'll use the wrong notation, please don't kill me for that):
There are Subjects (S) and Questions (Q). Goal of the predictor is to select the subject S which has the greatest aposterior probability of being the subject that user is thinking about, given questions and answers collected so far.
Let game G be a set of questions asked and answers given: $\{q_1, a_1\}, \{q_2, a_2\} … \{q_n, a_n\}$.
Then predictor is looking for $P(S|G) = \frac{P(G|S) * P(S)}{P(G)}$.
Prior for subjects ($P(S)$) could be just the number of times subject has been guessed divided by total number of games.
Making an assumption that all answers are independent, we could compute the likelihood of subject S given the game G like so:
$P(G|S) = \prod_{i=1..n} P(\{q_i, a_i\} | S)$
We could calculate the $P(\{q_i, a_i\} | S)$ if we keep track of which questions and answers were given when the used have though of given subject:
$P({q, a} | S) = \frac{answer\ a\ was\ given\ to\ question\ q\ in\ the\ game\ when\ S\ was\ the\ subject}{number\ of\ times\ q\ was\ asked\ in\ the\ games\ involving\ S}$
Now, $P(S|G)$ defines a probability distribution over subjects and when we need to select the next question we have to select the one for which the expected change in the entropy of this distribution is maximal:
$argmax_j (H[P(S|G)] – \sum_{a=yes,no,maybe…} H[P(S|G \vee \{q_j, a\})]$
I've tried to implement this and it works. But, obviously, as the number of subjects goes up, performance degrades due to the need to recalculate the $P(S|G)$ after each move and calculate updated distribution $P(S|G \vee \{q_j, a\})$ for question selection.
I suspect that I simply have chosen the wrong model, being constrained by the limits of my knowledge. Or, maybe, there is an error in the math. Please enlighten me: what should I make myself familiar with, or how to change the predictor so that it could cope with millions of subjects and thousands of questions?
Best Answer
This game looks similar to 20 questions at http://20q.net, which the creator reports is based on a neural network. Here's one way to structure such network, similar to the neural network described in Concept description vectors and the 20 question game.
You'd have
0/1
representsno/yes
answer. Initially set to0.5
Input units for questions that have been answered are set to 0 or 1, and the assumption is that neural network has been trained to make output units output values close to 1 for questions that have "Yes" answer given a set of existing answers.
At each stage you would pick the question which
NN
is the least sure about, ie, corresponding output unit is close to0.5
, ask the question, and set corresponding input unit to the answer. At the last stage you pick an output unit from the "final question" list with value closest to1
.Each game of 20 questions gives 20 datapoints which you can use to update
NN
's weights with back-propagation, ie, you update the weights to make the outputs of current neural network match the true answer given all the previous questions asked.