Maximum Sudoku pencil marks needed

sudoku

Most Sudoku mobile/video games have a pencil feature which lets you record a candidate in a given cell. Some of these let you pencil in all 9 candidates, and some only a few (3 – 5). The reduction in number of candidates that can be penciled in does not depend on any kid of validation in games that offer it.

I never played a Sudoku game (regardless of difficulty level) where penciling in all 9 numbers as candidates was sensible. In fact, it seems that the row, the column and the region of the target cell would have to all be empty for all 9 numbers to be used as candidates.

What is the realistic/average/probable number of candidates that a cell in Sudoku can have? Is there a way to calculate that based on the difficulty/number of currently filled cells?

Best Answer

There definitely exist Sudoku having some cell without any eliminations that arise directly from initial clues or from unit propagation of naked singles. Of the 49158 known 17-clue puzzles there are 82 with this property. Here is one example:

.......51....36.............4.5...8.2.....6.......1.......2.34..1.4..7..6........

Head over here, import the puzzle, click Take Step to update the pencilmarks, and you'll see that cell C3 still has 9 candidates.

But if we additionally propagate hidden singles, then none of the 17-clue puzzles has a cell without initial eliminations. This shouldn't be surprising since 17-clue puzzles are generally easy and a large fraction of them are solved entirely after initial propagation of naked and hidden singles.

Here are the actual distributions of initial cell candidate counts across all 17-clue puzzles:

# after naked singles only:
     V1      V2      V3      V4      V5      V6      V7      V8      V9
0.23374 0.02876 0.09699 0.20908 0.24695 0.14527 0.03649 0.00269 0.00002

# after naked and hidden singles:
     V1      V2      V3      V4      V5      V6      V7      V8      V9 
0.79189 0.03841 0.05907 0.06201 0.03614 0.01097 0.00145 0.00005 0.00000  

I'm not sure there's much relationship between this distribution and puzzle difficulty, and it's not even easy to pin down what we mean by difficulty. FWIW, here's the same distribution for a large set of much more difficult puzzles (by the measure of Sudoku Explainer difficulty rating > 11):

# after naked singles only:
     V1      V2      V3      V4      V5      V6      V7      V8      V9 
0.30573 0.02490 0.21513 0.27271 0.14950 0.03051 0.00150 0.00002 0.00000 

# after naked and hidden singles:
     V1      V2      V3      V4      V5      V6      V7      V8      V9 
0.31799 0.02526 0.21428 0.26574 0.14559 0.02967 0.00145 0.00002 0.00000 

Not much difference with/without hidden singles here since the hard puzzles are the very ones that don't allow immediate progress in simple ways.

On a related note, there's an interesting and much less well-studied variation of Sudoku called Pencilmark Sudoku or Sukaku. The rules are exactly the same, except your starting clues are a set of candidate eliminations instead of a set of filled-in cells (i.e., you are given a set of negative literals instead of positive ones). I don't think anyone has proven a lower bound for the number of clues required to determine a unique solution in Pencilmark Sudoku, but it is known that this bound is <= 86, which translates to an average of just over 7.9 remaining candidates per cell. By comparison, 17-clue vanilla Sudoku are positively high-clue puzzles when viewed as Pencilmark Sudoku, since they start with 817=136 eliminations in the given cells, but also another 2017=340 eliminations in cells visible from the given cells (necessarily with some degree of overlap, so not quite 340).

Related Question