You can measure the relative level of correlation (or more precisely, the increasing level of randomness) using the Shannon entropy of the difference in face value between all pairs of adjacent cards.
Here's how to compute it, for a randomly shuffled deck of 52 cards. You start by looping once through the entire deck, and building a sort of histogram. For each card position $i=1,2,...,52$, calculate the difference in face value $\Delta F_{i} = F_{i+1} - F_{i}$. To make this more concrete, let's say that the card in the $(i+1)$th position is the king of spades, and the card in the $i$th position is the four of clubs. Then we have $F_{i+1} = 51$ and $F_{i} = 3$ and $\Delta F_{i} = 51-3 = 48$. When you get to $i=52$, it's a special case; you loop around back to the beginning of the deck again and take $\Delta F_{52} = F_{1} - F_{52}$. If you end up with negative numbers for any of the $\Delta F$'s, add 52 to bring the face value difference back into the range 1-52.
You will end up with a set of face value differences for 52 pairs of adjacent cards, each one falling into an allowed range from 1-52; count the relative frequency of these using a histogram (i.e., a one-dimensional array) with 52 elements. The histogram records a sort of "observed probability distribution" for the deck; you can normalize this distribution by dividing the counts in each bin by 52. You will thus end up with a series of variables $p_{1}, p_{2}, ... p_{52}$ where each one may take on a discrete range of possible values: {0, 1/52, 2/52, 3/52, etc.} depending upon how many pairwise face value differences ended up randomly in a particular bin of the histogram.
Once you have the histogram, you can calculate the Shannon entropy for a particular shuffle iteration as $$E = \sum_{k=1}^{52} -p_{k} ln(p_{k})$$ I have written a small simulation in R to demonstrate the result. The first plot shows how the entropy evolves over the course of 20 shuffle iterations. A value of 0 is associated with a perfectly ordered deck; larger values signify a deck which is progressively more disordered or decorrelated. The second plot shows a series of 20 facets, each containing a plot similar to the one that was originally included with the question, showing shuffled card order vs. initial card order. The 20 facets in the 2nd plot are the same as the 20 iterations in the first plot, and they are also color coded the same as well, so that you can get a visual feel for what level of Shannon entropy corresponds to how much randomness in the sort order. The simulation code that generated the plots is appended at the end.
library(ggplot2)
# Number of cards
ncard <- 52
# Number of shuffles to plot
nshuffle <- 20
# Parameter between 0 and 1 to control randomness of the shuffle
# Setting this closer to 1 makes the initial correlations fade away
# more slowly, setting it closer to 0 makes them fade away faster
mixprob <- 0.985
# Make data frame to keep track of progress
shuffleorder <- NULL
startorder <- NULL
iteration <- NULL
shuffletracker <- data.frame(shuffleorder, startorder, iteration)
# Initialize cards in sequential order
startorder <- seq(1,ncard)
shuffleorder <- startorder
entropy <- rep(0, nshuffle)
# Loop over each new shuffle
for (ii in 1:nshuffle) {
# Append previous results to data frame
iteration <- rep(ii, ncard)
shuffletracker <- rbind(shuffletracker, data.frame(shuffleorder,
startorder, iteration))
# Calculate pairwise value difference histogram
freq <- rep(0, ncard)
for (ij in 1:ncard) {
if (ij == 1) {
idx <- shuffleorder[1] - shuffleorder[ncard]
} else {
idx <- shuffleorder[ij] - shuffleorder[ij-1]
}
# Impose periodic boundary condition
if (idx < 1) {
idx <- idx + ncard
}
freq[idx] <- freq[idx] + 1
}
# Sum over frequency histogram to compute entropy
for (ij in 1:ncard) {
if (freq[ij] == 0) {
x <- 0
} else {
p <- freq[ij] / ncard
x <- -p * log(p, base=exp(1))
}
entropy[ii] <- entropy[ii] + x
}
# Shuffle the cards to prepare for the next iteration
lefthand <- shuffleorder[floor((ncard/2)+1):ncard]
righthand <- shuffleorder[1:floor(ncard/2)]
ij <- 0
ik <- 0
while ((ij+ik) < ncard) {
if ((runif(1) < mixprob) & (ij < length(lefthand))) {
ij <- ij + 1
shuffleorder[ij+ik] <- lefthand[ij]
}
if ((runif(1) < mixprob) & (ik < length(righthand))) {
ik <- ik + 1
shuffleorder[ij+ik] <- righthand[ik]
}
}
}
# Plot entropy vs. shuffle iteration
iteration <- seq(1, nshuffle)
output <- data.frame(iteration, entropy)
print(qplot(iteration, entropy, data=output, xlab="Shuffle Iteration",
ylab="Information Entropy", geom=c("point", "line"),
color=iteration) + scale_color_gradient(low="#ffb000",
high="red"))
# Plot gradually de-correlating sort order
dev.new()
print(qplot(startorder, shuffleorder, data=shuffletracker, color=iteration,
xlab="Start Order", ylab="Shuffle Order") + facet_wrap(~ iteration,
ncol=4) + scale_color_gradient(low="#ffb000", high="red"))
Here's a simulated example of two prices that are very highly correlated ($\rho = 0.9875$). When you attempt to predict the price change in one using the lagged value of the other, very little of the variation in the price change is explainable:
. clear
. set seed 12092021
. set obs 102
Number of observations (_N) was 0, now 102.
. gen t = _n
. tsset t
Time variable: t, 1 to 102
Delta: 1 unit
. gen p1 = 1 + 3*t + rnormal(0,5)
. gen p2 = 3 + 2*t + rnormal(0,10)
. corr p1 p2
(obs=102)
| p1 p2
-------------+------------------
p1 | 1.0000
p2 | 0.9875 1.0000
. reg FD.p2 p1
Source | SS df MS Number of obs = 101
-------------+---------------------------------- F(1, 99) = 0.01
Model | .727541841 1 .727541841 Prob > F = 0.9436
Residual | 14322.4337 99 144.671048 R-squared = 0.0001
-------------+---------------------------------- Adj R-squared = -0.0100
Total | 14323.1613 100 143.231613 Root MSE = 12.028
------------------------------------------------------------------------------
FD.p2 | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
p1 | .0009672 .0136392 0.07 0.944 -.0260959 .0280303
_cons | 1.665843 2.420693 0.69 0.493 -3.137338 6.469024
------------------------------------------------------------------------------
. reg FD.p1 p2
Source | SS df MS Number of obs = 101
-------------+---------------------------------- F(1, 99) = 0.01
Model | .683934381 1 .683934381 Prob > F = 0.9171
Residual | 6210.52068 99 62.7325321 R-squared = 0.0001
-------------+---------------------------------- Adj R-squared = -0.0100
Total | 6211.20461 100 62.1120461 Root MSE = 7.9204
------------------------------------------------------------------------------
FD.p1 | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
p2 | -.0013704 .0131245 -0.10 0.917 -.0274123 .0246715
_cons | 3.260085 1.574913 2.07 0.041 .1351165 6.385054
------------------------------------------------------------------------------
Here FD is the first difference of subsequent value (so $FD.p_t = (p_{t+1}-p_t)$).
The $R^2$ (aka R-squared) of both models is around zero, so very little of the variation in price changes tomorrow can be explained by the price today. This illustrates the intuition that knowing what you know today, you cannot act on this correlation to make money tomorrow.
You can play around with variations on this approach (using the lagged price change as a predictor, non-linear models, adding more data, more noise, or adding trends), with identical results.
You might object that my toy example is flawed because the high correlation is contemporaneous, so if you knew p1 today, you could predict p2 today. I think that is wrong for the following reason. Suppose the DGP is as above, but unknown to you. You are an executive at company 1, and you learn that your CEO had been falsifying earnings and pinching bottoms. The news will become public shortly and lower p1. You can’t short your own stock without a vacation at Club Fed. Should you short the stock of company 2 if you know the correlation between p1 and p2 is ~1? I think that would be a terrible idea. This is what makes the correlation spurious and why that matters.
You could also have a causal relationship, but no correlation. When a house has air-conditioning with a preset desired temperature, there will be a strong positive non-spurious correlation between the amount of electricity used by the AC and the temperature outside. But there will be no correlation between the amount of electricity consumed and the inside temperature. The outside temperature and the inside temperature will also be uncorrelated. The last two are spurious non-correlations in my mind. But all three correlation are valid (though that has no formal definition in statistics) since a correlation is just a transformation of the data.
This is all to say that a strong correlation is not necessary for a causal dependence to exist. And it is certainly not sufficient. Even the sign on the causal relationship could be different from the sign of the correlation. This matters for using correlations to do things out in the real world (i.e., interventions). This is not just an issue with time series data, but can happen with observational data.
Best Answer
No amount of imputation, time series analysis, GARCH models, interpolation, extrapolation, or other fancy algorithms will do anything to create information where it does not exist (although they can create that illusion ;-). The history of Y's price before X went public is useless for assessing their subsequent correlation.
Sometimes (often preparatory to an IPO) analysts use internal accounting information (or records of private stock transactions) to retrospectively reconstruct hypothetical prices for X's stock before it went public. Conceivably such information could be used to enhance estimates of correlation, but given the extremely tentative nature of such backcasts, I doubt the effort would be of any help except initially when there are only a few days or weeks of prices for X available.