Solved – Correlation between two Decks of cards

correlationinformation theorypearson-r

I've written a program to simulate an overhand card shuffle.

Each card is numbered, with suit going from CLUBS, DIAMONDS, HEARTS, SPADES and the rank from Two up to Ten then Jack, Queen, King and Ace. Thus the Two of Clubs has a Number of 1, the Three of Clubs a 2 ….Ace of Clubs is 13… Ace of Spades is 52.

One of the methods to determine how shuffled the cards are is to compare it to an unshuffled card and see if the order of cards is correlated.

That is, I might have these of cards, with the unshuffled card for comparison:

Unshuffled Shuffled Unshuffled number Shuffled number
Two of Clubs Three of Clubs 1 2
Three of Clubs Two of Clubs 2 1
Four of Clubs Five of Clubs 3 4
Five of Clubs Four of Clubs 4 3

The correlation by the Pearson method would be: 0.6

With a large set of cards (all 52) you might see patterns emerge. My Hypothesis is that after more shuffling you'll get less correlation.

However, there are lots of ways to measure correlation.

I've tried my hand at Pearson's correlation but I'm not sure if this is the right correlation to use in this situation.

Is this a suitable correlation measure? Is there a more suited measure?

Bonus Points I sometimes see this sort of data in my results:

Sample Card Correlation results in a graph with multiple separate lines with different gradients

Clearly there is some correlation, because there are multiple separate lines of with different gradients, but I don't know how you measure the separate 'trendlines'?

Best Answer

You can measure the relative level of correlation (or more precisely, the increasing level of randomness) using the Shannon entropy of the difference in face value between all pairs of adjacent cards.

Here's how to compute it, for a randomly shuffled deck of 52 cards. You start by looping once through the entire deck, and building a sort of histogram. For each card position $i=1,2,...,52$, calculate the difference in face value $\Delta F_{i} = F_{i+1} - F_{i}$. To make this more concrete, let's say that the card in the $(i+1)$th position is the king of spades, and the card in the $i$th position is the four of clubs. Then we have $F_{i+1} = 51$ and $F_{i} = 3$ and $\Delta F_{i} = 51-3 = 48$. When you get to $i=52$, it's a special case; you loop around back to the beginning of the deck again and take $\Delta F_{52} = F_{1} - F_{52}$. If you end up with negative numbers for any of the $\Delta F$'s, add 52 to bring the face value difference back into the range 1-52.

You will end up with a set of face value differences for 52 pairs of adjacent cards, each one falling into an allowed range from 1-52; count the relative frequency of these using a histogram (i.e., a one-dimensional array) with 52 elements. The histogram records a sort of "observed probability distribution" for the deck; you can normalize this distribution by dividing the counts in each bin by 52. You will thus end up with a series of variables $p_{1}, p_{2}, ... p_{52}$ where each one may take on a discrete range of possible values: {0, 1/52, 2/52, 3/52, etc.} depending upon how many pairwise face value differences ended up randomly in a particular bin of the histogram.

Once you have the histogram, you can calculate the Shannon entropy for a particular shuffle iteration as $$E = \sum_{k=1}^{52} -p_{k} ln(p_{k})$$ I have written a small simulation in R to demonstrate the result. The first plot shows how the entropy evolves over the course of 20 shuffle iterations. A value of 0 is associated with a perfectly ordered deck; larger values signify a deck which is progressively more disordered or decorrelated. The second plot shows a series of 20 facets, each containing a plot similar to the one that was originally included with the question, showing shuffled card order vs. initial card order. The 20 facets in the 2nd plot are the same as the 20 iterations in the first plot, and they are also color coded the same as well, so that you can get a visual feel for what level of Shannon entropy corresponds to how much randomness in the sort order. The simulation code that generated the plots is appended at the end.

Shannon information entropy vs. shuffle iteration

Shuffle order vs. start order for 20 iterations of shuffling, showing cards becoming progressively less correlated and more randomly distributed over time.

library(ggplot2)

# Number of cards
ncard <- 52 
# Number of shuffles to plot
nshuffle <- 20
# Parameter between 0 and 1 to control randomness of the shuffle
# Setting this closer to 1 makes the initial correlations fade away
# more slowly, setting it closer to 0 makes them fade away faster
mixprob <- 0.985 
# Make data frame to keep track of progress
shuffleorder <- NULL
startorder <- NULL
iteration <- NULL
shuffletracker <- data.frame(shuffleorder, startorder, iteration)

# Initialize cards in sequential order
startorder <- seq(1,ncard)
shuffleorder <- startorder

entropy <- rep(0, nshuffle)
# Loop over each new shuffle
for (ii in 1:nshuffle) {
    # Append previous results to data frame
    iteration <- rep(ii, ncard)
    shuffletracker <- rbind(shuffletracker, data.frame(shuffleorder,
                            startorder, iteration))
    # Calculate pairwise value difference histogram
    freq <- rep(0, ncard)
    for (ij in 1:ncard) {
        if (ij == 1) {
            idx <- shuffleorder[1] - shuffleorder[ncard]
        } else {
            idx <- shuffleorder[ij] - shuffleorder[ij-1]
        }
        # Impose periodic boundary condition
        if (idx < 1) {
            idx <- idx + ncard
        }
        freq[idx] <- freq[idx] + 1
    }
    # Sum over frequency histogram to compute entropy
    for (ij in 1:ncard) {
        if (freq[ij] == 0) {
            x <- 0
        } else {
            p <- freq[ij] / ncard
            x <- -p * log(p, base=exp(1))
        }
        entropy[ii] <- entropy[ii] + x
    }
    # Shuffle the cards to prepare for the next iteration
    lefthand <- shuffleorder[floor((ncard/2)+1):ncard]
    righthand <- shuffleorder[1:floor(ncard/2)]
    ij <- 0
    ik <- 0
    while ((ij+ik) < ncard) {
        if ((runif(1) < mixprob) & (ij < length(lefthand))) {
            ij <- ij + 1
            shuffleorder[ij+ik] <- lefthand[ij]
        }
        if ((runif(1) < mixprob) & (ik < length(righthand))) {
            ik <- ik + 1
            shuffleorder[ij+ik] <- righthand[ik]
        }
    }
}
# Plot entropy vs. shuffle iteration
iteration <- seq(1, nshuffle)
output <- data.frame(iteration, entropy)
print(qplot(iteration, entropy, data=output, xlab="Shuffle Iteration", 
            ylab="Information Entropy", geom=c("point", "line"),
            color=iteration) + scale_color_gradient(low="#ffb000",
            high="red"))

# Plot gradually de-correlating sort order
dev.new()
print(qplot(startorder, shuffleorder, data=shuffletracker, color=iteration,
            xlab="Start Order", ylab="Shuffle Order") + facet_wrap(~ iteration,
            ncol=4) + scale_color_gradient(low="#ffb000", high="red"))
Related Question