Solved – Tic Tac Toe AI with Machine Learning

algorithmsgamesmachine learningreinforcement learning

I am new to machine learning and as one of my first projects I decided to make a tic tac toe bot. What I've done so far is a rgb/hex color recognition with the naive bayes algorithm, which was really simple to make. Now I am struggling a bit with making a tic tac toe bot.

My question is how should I proceed with this, is there a dataset, and should I take naive bayes again, or decision tree would be better for tic tac toe? (once again, purpose is just that I learn and dive deeper in machine learning)

Best Answer

Presumably you don't want an AI which looks ahead a few moves and brute-forces the best move. I guess you want an AI which will evaluate the strength of each possible move and choose the best.

One way you can approach this is to train an AI to take an input of the board and an input of where to play next and output a probability that this move will lead to a win.

You can create your own data by playing this AI against itself or against a player which plays randomly. This is more involved than using a dataset with the best moves listed for many positions, it's an option if you can't find such a dataset or if you want a challenge.

One possible way to create your own data and use it to iteratively improve the AI is the following:

Let the AI play a few moves and then pause the game
Select a random move to play (random allows the AI to learn from moves it wouldn't normally make)
Record the state of the game and the new move
Let the AI finish the game and record the result

This approach will create game data with many positions and many actions taken with the expected win/loss/draw result. You can use this data to train an AI to predict the result of the game if a given move is played. Repeat this training cycle to iteratively improve the AI.

Related Solutions

Solved – Training a Tic Tac Toe brain – am I on the right track

The main flaw in this method is the following one. You want to predict a move that leads to winning, but instead the model you use tries to guess one certain move from the database that may lead to winning. To fix that, at least you should estimate your model performance not by predicting the “correct” move for the test example, but by the rate of wins it actually achieves. In most cases, there is no single correct move, since multiple moves can lead to winning. For example, for your object

X[0] = [0,2,2,
        1,1,0,
        0,0,0]

the most reasonable prediction is 6, although the training set probably also contains 9, since it may eventually lead to winning as well. However, the move 9 is worse than 6: the rational opponent would win in that case. However, the model treats that moves equally since the training set lacks any losing strategies.

If you still want to treat it as a supervised learning problem, you can instead model the distribution $P(\textit{result} \mid \textit{move}, \textit{board})$ where $\textit{result} \in \{\textit{win} , \textit{draw}, \textit{lose} \}$. To choose the move, just maximize the probability of winning, or minimize the probability of losing. More generally,

$m = \arg\!\min_{move} ~f\Big(P(\textit{result} = \textit{win} \mid \textit{move}, \textit{board}), P(\textit{result} = \textit{lose} \mid \textit{move}, \textit{board}) \Big)$,

where $f$ is some loss function. You can estimate those conditional probabilities from simulations, similarly to what you done for generation of winning strategies.

However, this model has the problem: it treats your opponent’s moves (and your further moves) as being drawn randomly. Game theory advices to choose your strategy assuming that your opponent is rational, i.e. you chose your move to maximize your expected payoff, assuming your opponent then chooses hers by minimizing it. For tic-tac-toe, the best strategy in this sense can be found deterministically by traversing the tree of board configurations, not even exploiting learning from data.

If you want to account for long-term strategies in games like that, you should use techniques from reinforcement learning, which is, roughly speaking, learning not from class labels, but from simulations. Q-learning is one such technique. Here is the lab assignment asking to implement q-learning exactly for tic-tac-toe. It quotes this chapter on theory of reinforcement learning.

Solved – Preparing for machine learning exam

The answer is that All of the methods can be used for the above problem.

Well, two things should be noted in these kinds of simple problems.

Is it a classification or a regression problem? You might have already guessed that it is a classification problem.
Are there any categorical values in the input features? If yes, does the chosen algorithm work with categorical variables.

The examiner may expect the answer that neural networks, SVM etc. don't work with categorical variables. But in fact you can encode a categorical variable as a series of binary variables. For example if the variable age group takes values {child, young, old}, then you may change this single variable to three binary variables; is_child, is_young and is_old. This way you can use svm or neural network.

Again linear regression looks like an unlikely candidate for a classification problem. But they can be used for classification as well. You don't expect any mentionable performance though.

Best Answer

Related Solutions

Solved – Training a Tic Tac Toe brain – am I on the right track

Solved – Preparing for machine learning exam

Related Question