Solved – How to calculate the probability that white will win a chess game

conditional probabilityprobabilitypython

I am developing a chess engine, and wrote four classifiers (logistic regression) for:

  1. detecting which part of the game it is (opening, middlegame, endgame) based on current board / position
  2. predicting that white will win (no draw games) if in opening position
  3. predicting that white will win (no draw games) if in middlegame position
  4. predicting that white will win (no draw games) if in endgame position

Now for the engine, I would like to compute the probability that white will win (for the minimax algorithm, white tries to increase this probability, black tries to decrease this probability by making moves). My first guess was to compute:

P(W) = P(W|Opening)*P(Opening) + P(W|Middlegame)*P(Middlegame) + P(W|Endgame)*P(Endgame)

Where each probability on the right is from logistic regression above.
But now I am unsure, if this is the right "ansatz".


Edit:
The idea here is to have a bitboard representation of a position and to interpret the coefficients of the logistic regression as piece value tables as is custom in other engines, with the difference that the values in this table are from logistic regression and not from experience of the programmer / player. As you can imagine predicting is stronger at the end of the game (opening 64% accuracy, middlegame 75 %, end game 83% ) so my idea was to write a classifier which detects on which part of the game it is and then to "combine" these different classifiers to have an evalutation function for each board.

I will illustrate this with some concrete probabilities:

 r n b q k b n r
 p p p p p . . .
 . . . . . . . p
 . . . . . p p .
 . . . P P . . P
 . . . . . . . N
 P P P . . P P .
 R N B Q K B . R

 # whichPart =  [0.91647043 0.07489996 0.00862961]
 # prOpen =  [0.20907355 0.79092645]
 # prMiddle =  [0.00829384 0.99170616]
 # prEnd =  [0.00323957 0.99676043]
 # pWhiteWins =  0.8077411092561112

So for the position above, the logistic regression whichPart detects correctly that the position belongs to the opening part ( 0.91 probability). But then prOpen thinks that white will win with probability 0.79, which is unclear at this point and which results in 0.8 probability that white will win.

My question is: How do I incorporate the uncertainity of the logistic regression prOpen when calculating the probability that white will win?

Below are the confusion matrices for the logistic regressions:

  1. Logistic regression for detecting which part of the game it is:

    [[1536  205   20]
     [ 226 1202  376]
     [   6  299 1381]]
    
  2. Logistic regression for detecting if white won the game based on observing one position from the opening:

    [[334 421]
     [240 756]]
    
  3. Logistic regression for detecting if white won the game based on observing one position from the middlegame:

    [[517 238]
     [208 788]]
    
  4. Logistic regression for detecting who won the game based on observing one position from the endgame:

    [[611 144]
     [157 839]]
    

Best Answer

Rather than trying to reinvent the wheel here, it is probably worth noting that there is a huge academic literature working towards solving the game of chess. It is worth reading some of this literature to understand what has already been done, however, there are a few things you will need to consider before you start with your problem. The first thing to note is that you will need to specify what you mean when you refer to the "probability" of winning, because chess is a game where the solution under perfect play is deterministic.

Chess is not a solved game (it is extremely complex), but many of its endgames have been solved, and there is already a large literature on computational methods relating to chess. You can get some basic information in Simon and Schaeffer (1992) and Elkies (1996)


Under perfect play the outcome of chess is deterministic: Formally, chess is a deterministic finite two-player game with perfect information. There is an important result in game-theory called Zermelo's Theorem which proves that any game of this kind has only three possible outcomes at strategic equilibrium: either the first player always wins, or the second player always wins, or a draw always occurs. In other words, under perfect play, the result is deterministic, and so the probability of a white win is either zero or one.

Predicting the probability that white will win for non-perfect play: The above fact means that you are only going to get a non-trivial probability value if you are examining non-perfect play. If you want to examine this as a probabiltiy problem, as opposed to a computational problem in game theory, you will need to specify what kind of imperfect play you are interested in. Presumably you are interested in estimating the probability of a white win for some population of players using imperfect play. The worse the quality of play, the closer the probability is likely to be to one-half, and the better the quality of play, the closer the probability will be to the true zero/one value under perfect play.

What is your predictive information: If you are looking at chess under imperfect play, for a particular population, you will also need to specify the information you are using for your predictions. From your question it appears that you will have some characterization of the chosen strategy in the early-game, mid-game, and end-game. However, in the later part of your question it appears that you intend to take account of the entire board-state at certain points (which will give you a problem at the highest level of complexity that is presently computationally impossible to solve). You will need to have more of a think about the information you will use and the match data you are interested in.