TLDR - Assuming the "one point less to X" rule, the game is a draw.
For example, if X starts in the center and both players play optimal moves, then the following unique position (up to symmetry) can always be forced by X:
were no matter what X plays on the next two moves, X will end up with one point more than O. (This is a draw if you lower X's points.)
After every move by X (while forcing this position), O has only one correct response!
That is, every move by O can be forced by X. (If O makes just a single mistake, X will have multiple ways to get an advantage of more than just one point.)
Solution
I have solved the game computationally.
If first player X has penalty to start with 1 point less (equivalently, if second player O has compensation to start with 1 point more), then the game is a draw under optimal play. Otherwise, first player wins by 1 point (or more points if second player doesn't play optimally).
I have used alpha-beta pruning with transposition tables for search and bitboards to encode positions. After putting together some quick c++
code, it took me less than one hour to solve the game. (I've shared my initial code on codereview and an updated version on repl.it.)
Below are example games using optimal strategy. (To punish non-optimal moves, you can run my code and explore the variations. Note that it can take multiple minutes if you evaluate a position with only a few pieces on the board.)
Terminology
Given a 5 by 5 grid, call a square "adjacent" if it can be reached by one vertical or horizontal step, unless specified otherwise.
Let "advantage" be the number of raw points player X has more than player O. Raw meaning that we are ignoring the penalty aka compensation. To account for the penalty aka compensation, simply subtract one from the advantage.
First move
The optimal first move for X is to start in the center. This is the only first move after which X can still force a nonnegative advantage even if they make a mistake (plays a random move) on their second move!
The second best first move for X is to play in one of four adjacent squares to the center. Even if O follows up by playing in the center, X can still force positive advantage in this scenario.
The worst first move for X is to play in a corner or in a square adjacent to a corner. Then, O can force negative advantage after playing in the center.
That is, under optimal play, X can secure an advantage of $1$ by starting in any of the five central squares. But, the center itself is the best first move as it forces O to play more precisely.
After X plays the first move in the center, the only good move for O is to then play in one of the four adjacent squares. Otherwise, X can force an advantage greater than $1$, which is either $2$ or $3$.
Optimal game
WLOG (due to symmetry), the optimal game starts as follows: (X takes center and one of the four adjacent squares not opposite the one that O took, while O takes any adjacent square and then one opposite to the one that X took.)
The advantage that is forced under optimal play is shown on the left image. That is, X can continue to play into any of the squares marked with the advantage of $1$. The right image shows the number of different optimal responses by O if X plays there. (Lower number in the right image means O needs to be more precise).
Therefore, one can argue that squares containing $1$ on both images are the best optimal moves. It turns out that if X picks one of these two squares, then O must play in the other of the two. That is, we get one of the following two positions depending on what X picks.
- X continues up-adjacent to center.
- X continues left-down-adjacent to center.
Where again, in both cases, X can play in any of the $1$'s in the left image, where again $1$'s in the right image are arguably the best as they force O to play an exact move.
In both cases, X has two best optimal moves. But, in the first case, both best optimal moves are equivalent due to symmetry. This leads to three possible optimal games.
- X plays in 2nd square on the main diagonal, in the previous first case.
- X makes 3-in-a-row on main antidiagonal, in previous second case.
- X threatens to make double 3-in-a-row, in previous second case.
Notice that in the new third case, X now has a forced move and needs to play precisely from now on to keep the advantage of $1$! So maybe, picking a move that gives O least many options is not always the best optimal move.
In the new first case, whatever optimal move X picks, O will still have multiple optimal choices. I won't go into detail into these games as it starts branching a lot.
Only in the new second case, X has moves that give no choice to O. Expanding on these two choices from X, we have the following two "best" optimal games.
- X connects 3-in-a-row vertically, in new second case.
- X connects 3-in-a-row horizontally, in new second case.
Continuing to expand this newer second case, we have:
- X chooses to make 3-in-a-row.
- X chooses to block 3-in-a-row threatened by O.
The first option is better than the second, so expanding on it:
- X threatens two 3-in-a-row's.
- X threatens one 3-in-a-row.
The first option is again clearly better, and now it is not hard to also see that the first $1$ is the better continuation for the first option. This leads to one "best" optimal game:
Now again, it is not hard to see that $1$ in the bottom left corner is the best optimal move, which leads to the following position.
Once again, the $1$ in the last row is the best optimal move, leading to:
And now finally, X can force the best optimal position:
Finally, whatever X plays on next two moves, the advantage of $1$ is kept.
Notice that every move by O was forced so far.
That is, this unique position (up to symmetry), can always be forced by X!
You can verify all this and also explore various branches, alternatives and responses yourself, using my code that I linked. Btw, to increase the speed of solving early positions, you may want to start saving important optimal moves and responses.
Best Answer
For all pairs of neighbouring squares, Q must have the same probability of choosing a square in that pair. This is because the sum over all pairs is twice the sum over all squares, which is fixed at $1$, so if the probabilities aren't the same for all pairs, there will be one below average, and by covering that pair P has a strategy that is worse for Q than the equiprobable one.
It follows that Q must have the same probability for choosing all white squares and the same probability for choosing all black squares; since the total is fixed at $1$, there is one free parameter in his strategy.
P's situation is more complicated. On an $m\times n$ board she has $m(n-1)+n(m-1)=2mn-m-n$ different moves. She must achieve the same probability of covering each square, since otherwise Q could choose a below-average square and P would be worse off. That yields $mn$ linear constraints on the $2nm-m-n$ probabilities, leaving $nm-m-n$ free parameters. In your case, this also happens to be $1$, and we can calculate the corresponding set of strategies. P can make a horizontal move with probability $p$, an end vertical move with probability $q$ and the middle vertical move with probability $1-4p-2q$. Then the probability of covering one of the corners is $p+q$, and this must be $\frac13$. Thus the probabilities are $p$ for a horizontal move, $\frac13-p$ for an end vertical move and $\frac13-2p$ for the middle vertical move, with any $p\in[0,\frac16]$.
Any pair of these strategies for Q and P forms a mixed Nash equilibrium, and these are the only Nash equilibria.