Choosing columns in array

combinatorics

In an array with four rows and $n\geq 1$ columns, some subset of cells are marked. Can we always choose some columns in such a way that for at least three rows, the number of chosen marked cells and the number of unchosen marked cells differ by at most $1$?

An initial try is to pick some three fixed row (for example the first three rows) and see if it is possible to choose the columns in the required way. However, this is not always the case: it could be that there are three columns, in the first row the first and second columns are marked, in the second row the second and third column, and in the third row the third and first columns. But maybe there is some clever way to pick the three rows?

Best Answer

Yes, we can.

Let $P(n)$ be as follows :

$P(n)$ : In the $4\times n$ array, we can choose some columns in such a way that for at least three rows, the number of chosen marked cells and the number of unchosen marked cells differ by at most $1$.

In the following, $[a,b]$ represents the cell in the $a$-th row and in the $b$-th column. Also, $[a,b]=[c,d]$ means that either that both $[a,b]$ and $[c,d]$ are marked or that both $[a,b]$ and $[c,d]$ are unmarked.

Claim : If $P(7),P(8)$ are true, then $P(n)$ is true for all $n\ge 9$.

Proof :

There are $2^3=8$ possibilities for three cells to be marked or unmarked. So, if $n\ge 9$, then, by the pigeonhole principle, there is at least one pair of distinct integers $(s,t)$ such that $[1,s]=[1,t],$$[2,s]=[2,t]$ and $[3,s]=[3,t]$. We may suppose that $[1,1]=[1,2],$$[2,1]=[2,2]$ and $[3,1]=[3,2]$. Now, consider the $4\times (n-2)$ array made by the $(n-2)$ columns from the third to the $n$-th. Suppose that $P(n-2)$ is true. Then, we can choose some columns (call them $p$ columns) which satisfy our condition. Now, let us come back to the $4\times n$ array. If choosing the first column and the $p$ columns works, then we are done. If it doesn't work, then choosing the second column and the $p$ columns works. So, $P(n)$ is true. $\quad\square$

In the following, let us prove that $P(1),P(2),\cdots,P(8)$ are true.

We use the following two lemmas (the proofs are written at the end of the answer) :

Lemma 1 : If $P(n-1)$ is true and there is at least one column whose four cells are unmarked, then $P(n)$ is true.

Lemma 2 : If $P(n-2)$ is true and there is a pair of distinct integers $(s,t)$ such that at least three of $[i,s]=[i,t]\ (i=1,2,3,4)$ hold, then $P(n)$ is true.

From the lemmas,

$\ \ (\star)$ : we may suppose that there is at least one marked cell in each column and that there is no pair of distinct integers $(s,t)$ such that at least three of $[i,s]=[i,t]\ (i=1,2,3,4)$ hold.

This $(\star)$ reduces the number of cases to consider.

For $n=1$, we can choose the column.
For $n=2$, we can choose the first column.
For $n=3$, if choosing the first and the second columns works, then we are done. If it doesn't work, then we may suppose that $[1,1],[1,2],[2,1],[2,2],[3,1]$ are marked and $[1,3],[2,3],[3,2]$ are unmarked. Then, choosing the first column works.
For $n=4$, if there is a pair of distinct integers $(s,t)$ such that $[1,s]=[1,t]$ and $[2,s]=[2,t]$, then we may suppose that $[3,s]$ is marked and that $[3,t]$ is unmarked.
case 1 : If $[3,u]=[3,v]$ where $u\not=v$ are different from $s$ and $t$, then choosing the $s$-th and the $u$-th columns works.
case 2 : If $[3,u]$ is marked and $[3,v]$ is unmarked, then choosing the $s$-th and the $v$-th columns works.
If there is no pair of distinct integers $(s,t)$ such that $[1,s]=[1,t]$ and $[2,s]=[2,t]$, then we may suppose that $[1,1],[1,3],[2,1],[2,4]$ are marked and $[1,2],[1,4],[2,2],[2,3]$ are unmarked. If choosing the first and the second columns works, then we are done. If it doesn't work, then from $(\star)$, choosing the first column works.
For $n=5$, if there are two distinct pairs of distinct integers $(s,t),(u,v)$ such that $[1,s]=[1,t],$$[2,s]=[2,t],$$[1,u]=[1,v]$ and $[2,u]=[2,v]$, then choosing the $s$-th and the $v$-th columns works.
If there is only one pair of distinct integers $(s,t)$ such that $[1,s]=[1,t]$ and $[2,s]=[2,t]$, then we may suppose that we have the following three cases :
case 1 : $[1,1],[1,2],[2,1],[2,2],[3,1]$ are marked and $[3,2],[1,5],[2,5]$ are unmarked. If choosing the first and the third columns works, then we are done. If it doesn't workd, then choosing the second and the third columns works.
case 2 : $[1,1],[1,2],[1,3],[2,3],[2,4],[3,1]$ are marked and $[1,4],[1,5],[2,1],[2,2],[3,2]$ are unmarked. If choosing the first and the third columns works, then we are done. If it doesn't work, then choosing the second and the third columns works.
case 3 : $[1,3],[1,5],[2,4],[2,5],[3,1]$ are marked and $[1,1],[1,2],[1,4],[2,1],[2,2],[2,3],[3,2]$ are unmarked. If choosing the first and the third and the fourth columns workds, then we are done. If it doesn't work, then choosing the first and the fifth columns works.
For $n=6$, we may suppose that there are two distinct pairs of distinct integers $(s,t),(u,v)$ such that $[1,s]=[1,t],$$[2,s]=[2,t],$$[1,u]=[1,v]$ and $[2,u]=[2,v]$. Then, we may suppose that $[3,s],[3,u]$ are marked and that $[3,t],[3,v]$ are unmarked. Now, choosing the $s$-th and the $v$-th and the one from the rest two columns works.
For $n=7$, we may suppose that there are three distinct pairs of distinct integers $(a,b),(c,d),(e,f)$ such that $[1,a]=[1,b],$$[2,a]=[2,b],$$[1,c]=[1,d],$$[2,c]=[2,d],$$[1,e]=[1,f]$ and $[2,e]=[2,f]$. Then, we may suppose that $[3,a],[3,c],[3,e]$ are marked and that $[3,b],[3,d],[3,f]$ are unmarked. If $[3,g]$ is marked where $g$ is different from $a,b,c,d,e$ and $f$, then choosing the $a$-th, the $c$-th and the $f$-th columns works. If $[3,g]$ is unmarked, then choosing the $a$-th , the $d$-th and the $f$-th columns works.
For $n=8$, we may suppose that there are four distinct pairs of distinct integers $(a,b),(c,d),(e,f),(g,h)$ such that $[1,a]=[1,b],$$[2,a]=[2,b],$$[1,c]=[1,d],$$[2,c]=[2,d],$$[1,e]=[1,f],$$[2,e]=[2,f],$$[1,g]=[1,h]$ and $[2,g]=[2,h]$. Then, we may suppose that $[3,a],[3,c],[3,e],[3,g]$ are marked and that $[3,b],[3,d],[3,f],[3,h]$ are unmarked. Now, choosing the $a$-th, the $d$-th, the $e$-th and the $h$-th columns works.

Finally, let us prove the two lemmas :

Lemma 1 : If $P(n-1)$ is true and there is at least one column whose four cells are unmarked, then $P(n)$ is true.

Proof for Lemma 1 :

Consider the $4\times (n-1)$ array excluding the column. Since $P(n-1)$ is true, we can choose suitable columns (call them $p$ columns) for the $4\times (n-1)$ array. Now, coming back to the $4\times n$ array, we see that choosing the $p$ columns works. $\quad\square$

Lemma 2 : If $P(n-2)$ is true and there is a pair of distinct integers $(s,t)$ such that at least three of $[i,s]=[i,t]\ (i=1,2,3,4)$ hold, then $P(n)$ is true.

Proof for Lemma 2 :

Consider the $4\times (n-2)$ array excluding the $s$-th and the $t$-th columns. Since $P(n-2)$ is true, we can choose suitable columns (call them $p$ columns) for the $4\times (n-2)$ array. Now, let us come back to the $4\times n$ array. If choosing the $s$-th column and the $p$ columns works, then we are done. If it doesn't work, then choosing the $t$-th column and the $p$ columns works. $\quad\square$

Related Solutions

[Math] How many 90 ball bingo cards are there

Let $N^{(k)}_{a,b,c}$ be the number of possible unnumbered layouts of the last $k$ columns, given that there are $a$ numbers left to assign in the first row, $b$ in the second row, and $c$ in the third row. Given the rules (at least one number per column), you have the recursion $$ \begin{eqnarray} N^{(k)}_{a,b,c}&=&N^{(k-1)}_{a-1,b-1,c-1}+N^{(k-1)}_{a-1,b-1,c}+N^{(k-1)}_{a-1,b,c-1}+N^{(k-1)}_{a,b-1,c-1} + N^{(k-1)}_{a-1,b,c}+N^{(k-1)}_{a,b-1,c}+N^{(k-1)}_{a,b,c-1}, \end{eqnarray} $$ with the boundary conditions that $N^{(0)}_{0,0,0}=1$ and $N^{(k)}_{a,b,c}=0$ unless all indices are between $0$ and $k$. You want to find $N^{(9)}_{5,5,5}$. The following Python function performs the calculation:

def N(a,b,c,k,cache={(0,0,0,0):1}):
   if min(a,b,c)<0 or max(a,b,c)>k:
     return 0
   if not cache.has_key((a,b,c,k)):
     val  = N(a-1,b-1,c-1,k-1)
     val += N(a-1,b-1,c,k-1) + N(a-1,b,c-1,k-1) + N(a,b-1,c-1,k-1)
     val += N(a-1,b,c,k-1) + N(a,b-1,c,k-1) + N(a,b,c-1,k-1)
     cache[(a,b,c,k)] = val
   return cache[(a,b,c,k)]

N(5,5,5,9)
> 735210

The result is below the upper bound of ${{9}\choose{5}}^3=126^3=2000376$ obtained by allowing all-blank columns, as it should be.

For numbered layouts, the recursion is slightly different, because the number of ways to choose the numbers depends on the column. Here, let $m_k$ be the number of allowed numbers in the $k$-th column from the end: $m_1=11$, $m_9=9$, and $m_k=10$ otherwise. In a column with no blanks, there are ${m_k}\choose{3}$ ways to choose the numbers; with one blank, ${m_k}\choose{2}$; and with two blanks, $m_k$. The recursion is therefore $$ \begin{eqnarray} M^{(k)}_{a,b,c}&=&{{m_k}\choose{3}}M^{(k-1)}_{a-1,b-1,c-1}+ {{m_k}\choose{2}}\left(M^{(k-1)}_{a-1,b-1,c}+M^{(k-1)}_{a-1,b,c-1}+M^{(k-1)}_{a,b-1,c-1}\right) \\ && + m_k\left(M^{(k-1)}_{a-1,b,c}+M^{(k-1)}_{a,b-1,c}+M^{(k-1)}_{a,b,c-1}\right). \end{eqnarray} $$ Not unexpectedly, the result here is much larger: $$ M^{(9)}_{5,5,5} = 3669688706217187500. $$ As a sanity check on this result, one might consider the average branching factor for the $15$ numbers in an average unnumbered layout: $$ \left(\frac{M^{(9)}_{5,5,5}}{N^{(9)}_{5,5,5}}\right)^{\frac{1}{15}} \approx 7.023. $$ Since in a typical row (with $m_k=10$) the possible outcomes have average branching factors ${{10}\choose{3}}^{1/3}\approx 4.93$, ${{10}\choose{2}}^{1/2}\approx 6.71$, and $10$, this seems very reasonable.

Note that we can also enumerate the possible unnumbered layouts in an entirely different way, using inclusion-exclusion. The basic idea is to count all the ways of placing $5$ numbers in each of the three rows, then remove the cases with an all-blank column, then add back in the cases with two all-blank columns, and so on: $$ \begin{eqnarray} N^{(9)}_{5,5,5} &=& {{9}\choose{5}}^3 - {{9}\choose{1}}{{8}\choose{5}}^3 + {{9}\choose{2}}{{7}\choose{5}}^3 - {{9}\choose{3}}{{6}\choose{5}}^3 + {{9}\choose{4}} \\ &=& 735210, \end{eqnarray} $$ as above. This is an elegant solution to the unnumbered case, but I do not see a way to extend it to the numbered layouts.

[Math] Number of matrices with no repeated columns or rows

This can actually be worked out in a much more straight-forward manner than it seems, through a bit of reasoning.

Suppose that we construct a $10\times15$ matrix such that there are no repeated columns. Now, we must necessarily have that the number of repeated rows is somewhere between 0 and 9 (assuming that one row has to be "unique" first before there is a repeat). And so, we go about finding the number of possibilities for each repeat-count starting with the highest.

It is easy enough to see that there cannot be 9 repeats, as that would require all 15 columns to be either all-zeros or all-ones, and thus at most two would be "unique". Similarly, with 8 repeats, there would be only two "unique" rows, and thus only four possible columns. We can say the same thing about 7 repeats, which would provide only eight possible columns. Thus, we begin with the case of 6 repeats.

If there are 6 repeats, then there are four "unique" rows. In order to satisfy the conditions, these four "unique" rows must contain 15 of the 16 possible 4-bit combinations. The remaining 6 rows can be chosen from amongst the four "unique" rows.

If there are 5 repeats, then there are five "unique" rows, and we can find the number of these by observing that this is proportional to the number of $5\times15$ matrices with no repeated column, for which there are no repeating rows (and thus you subtract off a number proportional to the one we obtained for 6 repeats).

This can then be repeated until we get to $10\times15$, by working out how many $n\times15$ matrices can be made without repeats based on the value for $n-1$.

With the $n\times15$ case, there are ${2^n}\choose{15}$ possible matrices (up to column permutation). From these, we must subtract off the number of matrices that have $k$ repeated rows, with $k$ running from $1$ to $n-4$. We will call the number of $n\times15$ matrices without repeats $M_n$.

Now, we can see that $$ M_n ={2^n\choose15} - \sum_{k=4}^{n-1} S(n,k)M_k $$ where $S(n,k)$ is a Stirling number of the second kind. This is because we are partitioning our set of $n$ rows into non-empty subsets such that there are $k$ unique subsets, and for which order is irrelevant (that is, we take the subset containing the first row to be subset "1", then the subset containing the first row not in the first subset to be subset "2", and so on).

Now, we can do our calculations...

$$ M_4 = {16\choose15}\\ M_5 = {32\choose15} - S(5,4) M_4 = {32\choose15} - 10\times 16\\ M_6 = {64\choose15} - S(6,4) M_4 - S(6,5) M_5 = {64\choose15} - 65 M_4 - 15 M_5\\ \vdots $$ and so on.

Once you have $M_{10}$, you must finish the process by dividing out the possible row permutations - that is, divide by $10!$. Of course, the actual number is going to be very close to ${2^{10}\choose15}/10!$, as one would expect repeats to be far less common than non-repeats.

(Through use of a spreadsheet, I find that the number is roughly $2.709912\times10^{26}$, although some small truncation error may have made its way into that expression)

Best Answer

Related Solutions

[Math] How many 90 ball bingo cards are there

[Math] Number of matrices with no repeated columns or rows

Related Question