Solved – Finding most likely permutation

maximum likelihoodmeasurementmeasurement errorpermutationuncertainty

[Hoping that this is the right Stackexchange site; inspired from a true story seen at work]

Joe has a measuring instrument and $n$ objects to be measured (say, a scale and $n$ weights). He measures each one, obtaining a list of measurements $X=\left[x_1 \dots,x_n\right] \in\mathbb R^n$.

Later on, he sends me the objects. I want to find the correspondence between each object and its respective measured value $x_i$, but Joe forgot to number the objects or to sort them in any way that allows me to find which one is the $i$-th object. I therefore measure them again with a similar instrument, obtaining a list of values $Y=\left[y_1 \dots,y_n\right] \in\mathbb R^n$.

If our instruments were perfectly accurate, then $Y$ would be a permutation of $X$. However, our instruments are not perfect; while they both have perfect trueness, they have imperfect precision. In other words, if we measure the same object many times, the average value of the repeated measurements tends to the true value, but the results have (known) standard deviations $\sigma_J$ and $\sigma_I$ (for Joe's instrument and for mine, respectively). Therefore, the values in $X$ will in general be different from the values in $Y$.

In the limiting case where all values are distinct from each other (that is, $\displaystyle\min_{x_i,x_j\in X}\{|x_i-x_j|\}\gg\sigma_J$ and similarly $\displaystyle\min_{y_i,y_j\in Y}\{|y_i-y_j|\}\gg\sigma_I$), finding the correct permutation (that is, the correspondence between a value in $X$ and the corresponding value in $Y$) is trivial. When this is not the case, however, how would one go to find the most likely permutation from $X$ to $Y$ from the data available?

Bonus questions: does the answer change if I no longer assume perfect trueness? Is the case $\sigma_J=\sigma_I$ easier?

EDIT Forgot to ask: how do I compute the probability of a given permutation, that is, the probability that it is the correct one among the space of $n!$ possible permutations? Is there a simple (preferably closed-form) expression for the probability of the optimal permutation (which appears to be the one corresponding to sorting both vectors, see whuber's solution below – at least if the errors are normally distributed)?

EDIT 2 Per Aksakal observation (see comments to the question): assume that all true weights are strictly distinct (the measurements, both for me and for Joe, can be non-distinct values due to measurement error).

Best Answer

Provided the measurement errors are independent and identically Normally distributed for each instrument, the solution is to match the two sets of measurements in sorted order. Although this is intuitively obvious (comments posted shortly after the question was posted state this solution), it remains to prove it.

To this end, let the first set of measurements in sorted order be $x_1\le x_2\le \cdots \le x_n$ and let the second set of measurements in sorted order be $y_1\le y_2\le \cdots \le y_n.$ Let the error distributions have zero means and variances $\sigma^2$ for the X instrument and $\tau^2$ for the Y instrument. (I find this notation a little more congenial than the subscripting in the question.)

To find the most likely permutation, we solve the maximum likelihood problem. Its parameters are (a) the $n$ true weights $\theta_i$ corresponding to the objects measured by each $x_i$ and (b) the permutation $s$ that makes $y_{s(i)}$ the second measurement of object $i.$ Insofar as the likelihood depends on $(\theta)$ and $s,$ the likelihood of these observations is proportional to the exponential of

$$\mathcal{L}(\theta,s) = -\frac{1}{2}\sum_{i=1}^n \left(\frac{x_i-\theta_i}{\sigma}\right)^2 + \left(\frac{y_{s(i)}-\theta_i}{\tau}\right)^2.$$

For any given $s,$ this expression (and therefore its exponential) is maximized term by term by taking

$$\hat\theta_i = \frac{\tau^2 x_i + \sigma^2 y_{s(i)}}{\sigma^2 + \tau^2}.$$

For these optimal values of $\theta,$ the value of $-2\mathcal{L}$ (which we wish to minimize) is

$$-2\mathcal{L}(\hat\theta,s) = \frac{1}{\sigma^2+\tau^2}\sum_{i=1}^n \left(x_i - y_{s(i)}\right)^2.$$

When each squared expression is expanded we obtain (a) a sum of the $x_i^2,$ (b) a sum of the $y_{s(i)}^2$ (which equals the sum of the $y_i^2$ because $s$ is a permutation), and (c) the cross terms,

$$-2\sum_{i=1}^n x_i y_{s(i)}.$$

The Rearrangement Inequality states that such sums of products are maximized (thereby maximizing $\mathcal{L}(\hat\theta, s)$) when the $y_{s(i)}$ are in increasing order, QED.

This analysis relies on the Normality assumption. Although it can be relaxed, some distributional assumption is needed, as @fblundun perceptively points out in a comment to the question.

Related Solutions

Solved – How to compare measurements and uncertainties made with different measuring instruments

The model you use to "simulate your problem" can be used almost verbatim to estimate the parameters you are interested in using Bayesian estimation. Here is the model I'll use (using the same notation as you):

$$ L_B \sim \mathrm{Normal}(\mu, \sigma) \\ x_i \sim \mathrm{Normal}(\mu, \sigma) \mathrm{\ for\ i\ from\ 1\ to\ N} \\ L_{Ai} \sim \mathrm{Normal}(x_i \cdot \mathrm{gain} - \mathrm{offset}, \mathrm{dispersion}) \mathrm{\ for\ i\ from\ 1\ to\ N} \\ $$

The glaring omision in this model compared to your problem is that I don't include the assumption that some of the same $x_i$s that got measured by B could then be measured again by A. This could probably be added, but I'm not completely sure how.

This model is implemented in R & JAGS below using very vague, almost flat priors, the data used is the one you generated in your question:

library(rjags)

model_string <- "model{
  for(i in 1:length(L_B)) {
  L_B[i] ~ dnorm(mu, inv_sigma2) # <- reparameterizing sigma into precision 
                                 #    needed because of JAGS/BUGS legacy.  
  }
  for(i in 1:length(L_A)) {
    x[i] ~ dnorm(mu, inv_sigma2)
    L_A[i] ~ dnorm(gain * x[i] - offset , inv_dispersion2)
  }

  mu ~ dnorm(0, 0.00001)
  inv_sigma2 ~ dgamma(0.0001, 0.0001) 
  sigma <- sqrt(1 / inv_sigma2)
  gain ~ dnorm(0, 0.00001) T(0,)
  offset ~ dnorm(0, 0.00001)
  inv_dispersion2 ~ dgamma(0.0001, 0.0001)
  dispersion <- sqrt(1 / inv_dispersion2)
}"

Let's run it and see how well it does:

model <- jags.model(textConnection(model_string), list(L_A = L_A, L_B = L_B), n.chains=3)
update(model, 3000)
mcmc_samples <- coda.samples(model, c("mu", "sigma", "gain", "offset", "dispersion"), 200000, thin=100)
apply(as.matrix(mcmc_samples), 2, quantile, c(0.025, 0.5, 0.975))
##       dispersion   gain      mu   offset  sigma
## 2.5%     0.01057 0.1366 -0.3116 -0.51836 0.9365
## 50%      0.18657 1.0745 -0.1099 -0.26950 1.0675
## 97.5%    1.20153 1.2846  0.1051 -0.04409 1.2433

The resulting estimates are reasonably close to the values you used when you generated the data:

c(gain_A, offset_A, dispersion_A)
## [1]  1.1 -0.2  0.5

...except for, perhaps, dispersion. But with more data, perhaps more informed priors and running the MCMC sampling longer this estimate should be better.

Effect size for pairwise permutation

Using a permutation test doesn't necessarily preclude an effect size, although the connection between the test and the effect size may be broken. In general, a permutation test works by reshuffling the data and computing a statistic many times. The statistic could be something like a mean difference, or it could be a test statistic (e.g., $t$). Either way, this allows for an empirical estimate of the sampling distribution under the null. Comparing your statistic to the sampling distribution allows you to compute a $p$-value. A simple example of a permutation test can be seen in @jbowman's answer here: The z-test vs the χ2-test for comparing the odds of catching a cold in 2 groups. For a pairwise variant, the shuffling would only be within the pairs, but otherwise the principle is the same.

Best Answer

Related Solutions

Solved – How to compare measurements and uncertainties made with different measuring instruments

Effect size for pairwise permutation

Related Question