Fun question.
I don't have enough karma to add this as a comment, so I'll offer it up as an answer, although it is (currently) an incomplete one.
Some comments
The reference you link to for the $N=3$ case does not say that there are $255,168$ valid 3-by-3 tic-tac-toe boards, but that there are $255,168$ distinct 3-by-3 tic-tac-toe games.
That is, if you make a "game tree" where each (valid) board is a node and each move is an edge, you're asking how many nodes are in the tree. The Wikipedia article states that there are $255,168$ distinct paths through the tree (from the root to a leaf), but there are substantially fewer nodes. In fact we can set an upper bound on the number of distinct 3-by-3 tic-tac-toe boards by ignoring the rules of the game and noting that there are only $3^9$ ways to fill a 9x9 board with 3 tokens (blank, X and O), which is only $19,683$. And many of those have too many Xs or Os to be valid.
(The Wikipedia article is also taking symmetry into account, which if I understand you correctly, you want to ignore.)
I happen to be working on a program that generates this game tree for entirely different reasons.
The program may be buggy, so take the numbers above with a grain of salt, but it looks right to me on inspection. I believe I've seen other pages online that corroborate the $N=3$ case, but at the time I was only trying to validate that my program was working properly, so I didn't bother the record the link.
Currently my program runs out of memory (on my little netbook) when I try to generate the game tree for $N=4$, but I imagine that's fixable with a little bit beefier hardware and/or a little optimization of the program for memory footprint (currently I'm storing a lot of metadata about each board state).
Circumstances permitting, I'll try come back and update this answer with more information if and when I have some to share, but I wanted to put some hard numbers (and specific constraints) on the table because I think it clears up some of the confusion in the comment thread.
Assumptions
To clarify the constraints on the problem, here's what I'm assuming:
- We're interested in counting the number of valid "game states" on an $N$-by-$N$ tic-tac-toe board.
- We're ignoring symmetry, so that a board with one
X
in the upper-left corner is considered a distinct board from the one with an X
in the lower-right corner.
X
always moves first.
- The game stops with either player makes a horizontal, vertical or diagonal line of length $N$. Any boards that can only be reached by continuing to play after one of the players has made a line are considered invalid and ignored.
- (The game also stops when we run out of free spaces to move, of course.)
Some data based on enumeration
Under these constraints, as you wrote, we can confirm that:
- for $N=1$ there are $2$ valid and distinct boards
- for $N=2$ there are $29$ valid and distinct boards
Based on (programmatic) inspection, I think we can say that:
- for $N=3$ there are $5,478$ valid and distinct boards
- for $N=4$ there are $9,722,011$ valid and distinct boards
If you break these down by ply (turn):
N: 1 2 3 4
------- - -- ---- -------
ply 0: 1 1 1 1
ply 1: 1 4 9 16
ply 2: - 12 72 240
ply 3: - 12 252 1680
ply 4: - 0 756 10920
ply 5: - - 1260 43680
ply 6: - - 1520 160160
ply 7: - - 1140 400400
ply 8: - - 390 895950
ply 9: - - 78 1433520
ply 10: - - - 1962576
ply 11: - - - 1962576
ply 12: - - - 1543080
ply 13: - - - 881760
ply 14: - - - 333792
ply 15: - - - 83440
ply 16: - - - 8220
======= = == ==== =======
TOTAL: 2 29 5478 9722011
-------
I don't see an obvious formula for the sequence $(2,29,5478,9722011,...)$, but a few interesting (IMO) observations about this:
$N=3$ is the smallest board for which player O
can win
$N=3$ is the smallest board that can end in a tie game
$N=2$ is likely the only board that cannot be filled (X
is guaranteed to win on the third move, leaving one slot unfilled)
Both of the even examples have two plies in a row with the exactly same number of boards (for $N=2$ plies 2 and 3 have $12$ boards and for $N=4$ plies 10 and 11 have $1,962,576$). These are also the "widest" plies in their respective trees. (The same is true for $N=1$, but I imagine that's a degenerate case.)
(This didn't hold for $N=4$.) It may just be the law of small numbers, but I notice that the "widest" tier of the game tree is the one where there are $N$ empty spaces left on the board. For $N=1$, this is ply 0 with $1$ board, for $N=2$ this is ply 2 with $12$ boards and for $N=3$ this is ply 6, with $1,520$ boards.
and of course, at $N=1$, player O
doesn't even get to move.
Looking at upper bounds
By the way, as a very rough sanity check, I compared the number of valid boards with $3^{(N^2)}$ (the number of distinct ways to fill an $N\times{}N$ board with 3 symbols):
N: 1 2 3 4
-------- ---- ---- ------- ----------
# Valid: 2 29 5478 9722011
3^(N^2): 3 81 19683 43046721
% Valid: 66.7 35.8 27.8 22.6
We can get a tighter upper bound if we look at the number of ways to fill an $N\times{}N$ board with $count(X) = count(O)$ or $count(X) = count(O)+1$.
That's actually not the hard to figure out if you change your thinking a little bit. Instead of alternating between X and O, imagine you put down all the Xs first, then all the Os.
On the zeroth ply, there are no Xs or Os, so we always have 1 board. (Note that this is $N^2$ choose $0$, written ${{N^2} \choose 0}$).
On the first ply, there is exactly one X, so we have ${{N^2} \choose 1}$ distinct boards.
On the second ply, there is exactly one X and exactly one O, so we have ${{N^2} \choose 1} \times {((N^2)-1) \choose 1}$ boards.
On the third ply, there are two Xs and one O, so we have ${{N^2} \choose 2} \times {{((N^2)-2)} \choose 1}$ boards.
On the fourth ply, there are two Xs and two Os, so we have ${{N^2} \choose 2} \times {{((N^2)-2)} \choose 2}$ boards.
And so on.
In the general case
Note that on ply $p$ there will be $floor({{(p+1)}\over{2}})$ Xs and $floor({{p}\over{2}})$ Os on the board, so let's say:
- $x = floor({{(p+1)}\over{2}})$
and
- $o = floor({{p}\over{2}})$
Note that:
There are ${{N^2} \choose x}$ ways to place $x$ X
marks on an $N\times{}N$ board.
There are ${{N^2 - x} \choose o}$ ways to place $o$ O
marks on an $N\times{}N$ board that is already filled with $x$ X
s.
Hence ignoring winners there are:
${{N^2} \choose x} \times {{N^2 - x} \choose o}$
distinct boards at ply $p$. Or (plugging the definitions of $x$ and $o$ from above):
${{N^2} \choose {floor({{p+1}\over{2}})}} \times {{N^2 - {floor({{p+1}\over{2}})}} \choose {floor({p \over 2})}}$
So an upper bound on the number of distinct boards would the be summation of that ugly formula over $p = 0$ to $p = N^2$.
I imagine that the clever application of algebra could dramatically simplify that expression (especially when you plug the definition of $n \choose k$ in for the choose notation).
To get an even better count we could take into account the winning boards.
Best Answer
I think I have a solution... isnt not as beauty as I would like but my maths knowledge is limited. I will go point by point.
1.FIRST CONSIDERATIONS
We can think the board in the more simple way eliminating all symmetries: specular symmetries and rotational symmetries. So I will think the positions something like a tetris of 4 columns, i.e., the movement is from top to down.
Thinking this way cannot be possible to have any multiplicity of any number more than a pair (2) in any column but this formed by the suddenly apparition of a 2 or 4. So our pairs (or lines of three or four equal tiles) only can appear seeing by files instead of columns.
The pairs that we can have are of two types: contiguous and not contiguous. Contiguous pairs on columns only can appear after a collapse, so they are prohibited on columns of cardinality 4. And contiguous pairs of 2's are completely prohibited no matter what.
We think column as the position of elements but the sudden and random apparition of a 2 or 4 that occur in any board state.
2.COLUMS AND VARIATIONS ON COLUMS
From this we can have 5 types of columns, i.e., of cardinality 0 to 4, I will name as $C_i$ where i is the cardinality. The variations on the columns will depend if:
a) all tiles are different, in this case it will be $(17)_i$ where this is the representation of a falling factorial
b) there is some pair (or pairs), where we want that the pairs of 2 are not contiguous because you can have two contiguous pairs just if one of the number was generated by collapse (you cant generate a 2 by collapse). We will process the pair as a single, so for a column of 3 with a pair and a different number we will count it as $(17)_2$, i.e., the variations over two positions of 17 different values.
c) we must consider the geometry of pairs to compute variations only when the pair can be arranged in a different position. For a column of 3 if there is a pair of 2 it only can be as the extremes of the list because contiguity is prohibited for pair of 2's: 2o2. But for the case that the column have cardinality 4 we must consider the distributions 2o2c and the distribution o2c2. The singles values (the c and the o) from a point of view of the hipervariations (aka variations of level 2) are equivalents. But the case with two pairs different of 2's in xoxo cannot have any type of variation because the structure oxox in terms of the first variation level is equal.
d) to clarify: we calculate on columns over two levels of variations, just considering the different groups over the different distribution of singles and pairs of 2's or pairs of others numbers. Something like a variation level and a hipervariation level. We have then 3 types of sets: singles, pairs of 2's and others pairs, and a multiplicity bigger than a pair isnt possible on columns.
So the variations that we have by cardinality of the columns are: $C_0=1, C_1=17, C_2=(17)_2+16, C_3=(17)_3+3(17)_2-2\cdot 16, and\ C_4=(17)_4+2(17)_3+(17)_2$
This need a explanation: the first addend for each $C_i$ are the variations with all elements different, the others addends are the variations with valid pairs (any pairs are valid but the contiguous 2's). Example: for $C_2$ the variations of 17 different elements over two positions are $(17)_2$ and the valid pairs are just any pair but the 2's (16 type of pairs that arent 2's).
For $C_4$ the valid pairs are all with no contiguity at all because contiguity just can happen after a collapse of numbers that doesnt occur on a full column. You can have a column of 4 after a collapse by the apparition of a 2 or 4 but this is considered as a column of cardinality 3.
For $C_3$ the second number are permutations with repetition of 2 groups (multiplicity 1 and 2) multiplied by combinations of 17 elements in groups of two, i.e. $\binom{3}{2}\binom{17}{2}$. The third number are the invalid combinations with pair of 2's that are contiguous, i.e. $16\cdot 2$
3.VARIATIONS OF COLUMNS: COMPOSITION OF THE BOARD
The next level is to calculate all the variations of these 5 columns over the 4 possible positions (the board is 4x4) eliminating all the duplicates by specular and rotational symmetries.
First I will show a formula to express the variations of $C_i$
$$C_i=(17)_i+\binom{2/3}{i-3}3(17)_{i-1}+\binom{1}{i-4}(17)_{i-2}+(-2)^{i-2}16\cdot\delta_{2, 14\ \text{mod}\ i+1}$$
We need to add to the mix the presence of a 2 or 4 that is generated in every move of the board in any of the free positions, and the cases where one occupied tile is 2048. This is a factor that depends of the free (or blank) positions. And we need to fix the number of valid starting positions too, so
$$B(i)=(2(16-\Sigma i))^{1-\delta_{16,\Sigma i}}(1-\delta_{0,\Sigma i})$$
i.e. the total spaces less the cardinalities of the columns on this composition multiplied by 2 because we can have a 2 or a 4. The exponent is a Kronecker delta that prevent the zero when all columns have cardinality 4 (full board).
The second Kronecker delta eliminates the start with just one tile because you start with 2 tiles placed (you start with a column of cardinality 1).
(After is a fix for a few "floating" starting position, when both tiles arent on the borders of board).
We need to fix, from here, the duplicates that come from specular symmetry and rotational symmetry.
Fix for specular symmetry duplicates
All columns are duplicated by specularity in the sums but the compositions where columns have a perfect specular symmetry on cardinality (think about Rorschach images). So the formula need a general factor of $\frac{1}{2}$ to eliminate all of these duplicities and an internal factor that preserve the original multiplicity of these compositions with perfect symmetry (that are not duplicated).
This internal factor will be a Kronecker delta function that "see" when the composition have a perfect specular cardinality of columns.
The formula at this moment will be
$$S_1=\frac{1}{2}\sum_{a,b,c,d=0}^{4}C_aC_bC_cC_d\cdot B(i)\cdot 2^{\delta_{a,d}\delta_{b,c}}$$
But we need to add a fix for rotational duplicates too.
Fixing rotational duplicates
The rotational duplicates have a condition to exist: that cardinalities had a partial order what ensure a "stairs" shape that make possible for these composition to be rotated, so $a\geq b\geq c\geq d$. See that this law of composition prevent any duplicity by specularity.
This is because we are seeing the board as a tetris with a "gravity" so if the shape isnt something like a stairs the figure cant be rotated because isnt a valid shape with the fall mechanic.
So the fix for the rotational symmetry have this form
$$S_2=\sum_{a=1}^{4}\sum_{b=0}^{a}\sum_{c=0}^{b}\sum_{d=0}^{c}C_aC_bC_cC_d\cdot B(i)\cdot F(i)$$
Where F is a function that eliminates the false duplicates, i.e., the configurations that cant be rotated because they have pairs of contiguous 2's or more than pairs contiguous with other numbers.
In any position, talking of files of 3 or 4 contiguous and equals numbers, the probability for any number in any tile is $\frac{1}{17}$. So the probability that the same number appear contiguous is approximately $\frac{1}{17^3}$ if we are talking of 3 contiguous numbers or $\frac{1}{17^4}$ if we are talking of a full row. For the case of pairs of 2's the probability is $\frac{1}{18^2 17}$ that is an approximation of the mean of apparition (it changes depending on what tiles make the pair).
The amount of possible full rows of the same number is $d$, i.e., the cardinality of the last column in our stairs model for rotational duplicates.
The amount of possible trios will be $c+d$, and the amount of possible pairs of 2's will be $b+c+d$.
We will apply a factor (that is a probability) over each addend of $S_2$ (this is the F function). This is a binomial that is
$$P(X=0)=(1-p)^n, n\in\{d,c+d,b+c+d\}$$
So
$$F(i)=(1-\frac{1}{17^4}(1-\frac{1}{18^4 17}))^d(1-\frac{1}{17^3})^{c+d}(1-\frac{1}{18^2 17})^{b+c+d}$$
The second multiplier is a fix for duplicated full rows considering any numbers and two pairs of 2's.
4.FINAL FORMULA
The final (by now :p) formula is....:
$$S\approx S_1-S_2+S_3$$
Where $S_3=6$ that are a very little amount of starting positions where the first two tiles that appear are out of the borders.
P.S.: maybe a better approach to the problem a generating function... but my maths knowledge about it is near to zero.
P.S.2: sry, I had A LOT of edit of stupids and tiny mistakes (more) and some heavy mistakes (less). The formula of $C_i$ can be written in many different ways... I let one but surely isnt the more efficient on computational or symbolic sense.