[Math] Probability theory and measuring the true strength of chessplayers

combinatorial-game-theorypr.probabilityst.statistics

If you wanted to measure the strength of, say, a chess player, the best way would involve knowing the true value of each position: then you could compute the frequency $W$ with which the player finds a winning move in a won position, and $D$ of finding a drawing move in a drawn position.

Even without a perfect evaluation algorithm, perhaps mathematics offers the possibility of saying something about a player's $W$ and $D$? So I ask, do there exists tools in probability theory, if not for chess then at least for some class of idealized games (only the morphology of the game tree would matter) that would would allow prediction of one player's winning percentage over another given just the two players' $W$ and $D$ frequencies?

If yes, then the distribution of winning percentages in a population of players might serve as data for an inverse problem allowing the statistical estimation of $W$ and $D$ frequencies (or at least associated derived quantities, or relative quantities).

Also welcome: thoughts about refining the model in the second paragraph to get results
more realistic for real world games like chess (e.g., separate frequencies for
opening, middle game and ending).


Edit: While I appreciate critiques of my model, I hope the weakness of the model doesn't distract from the purely mathematical question of the 2nd paragraph: for suitable idealized games, can one compute dominance in the game globally from the players' $W$ and $D$ frequencies? If that probability question turns out intractable then my whole project sinks; if it has a positive answer, I can hope to refine the models.

I'm not attached to $W$ and $D$ as the ultimate measure of game playing strength. I am
interested in the mathematical challenge of estimating these frequencies in the absence of
an evaluation oracle.

Also, is it enough merely to point out the naivete of my model? Shouldn't the critic argue that my distortion has significant numerical effect on the dominance calculation?

Best Answer

Your question makes assumptions with which I disagree.

I do not think that strength means choosing winning moves more frequently in theoretically won positions. The positions encountered in chess are not uniformly random, and the positions you encounter depend on previous moves. You might find someone who reliably executes a nontrivial endgame, but who performs poorly in related positions someone else sets up.

Part of chess is giving an imperfect opponent opportunities to make mistakes. Your measure assumes there is no skill involved in playing theoretically lost positions, but in practice there is.

Although it is popular to call chess mathematical, I think many other games such as backgammon allow much deeper mathematical analysis than chess, in part because positions have equities which are not restricted to $\{0,1/2,1\}$, and there are MonteCarlo methods for estimating the values of positions. Serious backgammon players commonly measure skill in error rates expressed as normalized millipoints per move. In my November 20006 column for GammonVillage, I looked at the correspondence between backgammon error rates and Elo rating differences on one backgammon server, concluding, for example, "100 rating points roughly corresponds to 1.8 millipoints per move."