[Math] A balanced latin rectangle (more rows than columns)

algorithmscombinatoricslatin-square

In psychology we sometimes use balanced latin squares for the order of our tests to counterbalance first-order carry-over effects (fatigue, learning, etc.) .

For our current study we want to pretest 12 stimuli (let's call them A-F) to see whether they're useful for a later study. We don't want to bore our participants, so we wanted to give them only half of all the material we need to test. We're indifferent about the size of the subset of 12 as long as it is anything between 4-8 stimuli per participant.

For a different reason (to achieve sufficient statistical power) we need at least 132 participants (at least 11 runs where each stimulus occurs first), we don't want to exceed this too heavily.

A balanced latin square 6*6 isn't too hard to construct. There is a Matlab script as well.

A   B   F   C   E   D
B   C   A   D   F   E
C   D   B   E   A   F
D   E   C   F   B   A
E   F   D   A   C   B
F   A   E   B   D   C

But is it also possible to construct a balanced (latin) rectangle (6 columns wide), where each letter is followed by another letter an equal amount of times? How many rows (participants) would this yield?

Maybe somebody with a bit more handle on this problem will enjoy the puzzle!

Sorry if my language is too idiosyncratic, if I can clarify with the appropriate jargon I'll duly comply, this is quite outside my field.

Splitting it in the middle and then adding the broken-up orders seemed the wrong approach to me.


Edit: Can I find one computationally? I have no idea how ridiculous that question is, but the sheer number of permutations (479 001 600) does seem daunting.


Edit 2: I didn't want to make this question too much about our experiment, but apparently that made it less clear. I'm sorry. I edited the clarifications into the question.

Best Answer

I need to change your parameters a bit, but it sounded like you would be flexible, so let me suggest the following idea.

The idea only works when the number of stimuli is a prime number $p$. So if you need to test exactly 12 stimuli, then this is useless, but may be you can leave one out, or add a placebo/null test to the mix, and use this idea with $p=11$ or $p=13$.

The scheme has $p(p-1)$ rows and $k$ columns, where $k$ is any number between $2$ and $p$ inclusive.

Instead of letters A,B,... I use numbers $0,1,\ldots,p-1$ as table entries. One of the standard constructions for Latin squares is the following: First pick a parameter $m$ that is an integer in the range $1\le m<p$. Then put on row #$i$ and column #$j$ the number that equals the remainder of $i+mj$ when divided by $p$. This gives us a $p\times p$ Latin square. Call it $L(m,p)$. In this Latin square all the pairs of consecutive entries on all rows differ by $m$. Therefore this square alone is the very opposite of balanced. Within this square a zero is always followed by an $m$, a one by $m+1$ et cetera. Note that we count cyclically $\pmod p$, so an entry $p-m$ is always followed by $(p-m)+m=p\equiv 0$, and so forth.

Here comes the trick. We build a larger table with $p(p-1)$ rows by putting all the Latin squares $L(1,p)$, $L(2,p)$, $\ldots$, $L(p-1,p)$ on top of each other. By using all the possible values of $m$ we get a balanced table in the end!

We can take the $k$ first columns of this large table, and be done with it. Each entry occurs on all the columns exactly $p-1$ times. If two distinct entries, say $a$ and $b$, differ by $m=b-a$, the pair $(a,b)$ appears in the rows of $L(m,p)$ exactly $k-1$ times - once per each pair of adjacent columns, and doesn't appear on any other rows. There are no repetitions of stimuli within rows, as the rows are parts of rows of a Latin square.

Why doesn't this work with $p=12$? IOW, why do I insist that $p$ must be a prime? The reason is that the formula $i+m*j \bmod p$ gives a Latin square only, when $m$ is coprime to $p$. For example, if $p=12$ and $m=6$, then the rows of $L(6,12)$ look like 0,6,0,6,...; 1,7,1,7,...

With $p=13$ you get 156 rows, so the table may be too large for you. Another possibly troubling feature of this construction is the following. The rows of the Latin square $L(1,p)$ look like 0,1,2,...; 1,2,3,...; so they have intersections of $k-1$ elements. This may be bad for eliminating secondary correlations from your test. If you do $k=6$ tests per participant, then five participants will see stimulus #1 followed by stimulus #2. That's ok, but I am a bit troubled by the fact that four out of those five will see stimulus #3 next. And the fifth person won't see anything, because his/her day ends after stimulus #2. Similar patterns appear in other component squares $L(m,p)$. If these shortcomings make the construction unusable, then I apologize for wasting your time.

[Edit: Oh boy, I should learn to proofread and not post in such haste. I apologize for the mostly illegible first version :-(]

Related Question