Machine Learning – Validity of Unique True Labels for Given Log Loss Value

classificationlog-lossloss-functionsmachine learningsupervised learning

Aggarwal's 2021 ICML paper "Label Inference Attacks from Log-loss Scores", seems to argue that the answer to the question in the title is "YES". The paper claims that, given predicted probability values (such as from a logistic regression or neural network) and the log-loss calculated from those predictions and the true labels, there is an algorithm that will produce the true labels.

This means that the label order giving such a log-loss must be unique. If not, then we could have $\left\{dog, cat, dog, cat\right\}$ and $\left\{dog, dog, cat, cat\right\}$ result in the same log-loss values.

How would the algorithm be able to pick which set of labels is the correct set!?

Coding $dog$ and $0$ and $cat$ as $1$, I believe that I have such an example.

library(ModelMetrics)
p  <- c(0.2, 0.8, 0.8, 0.2)
y0 <- c(0, 1, 0, 1) # dog, cat, dog, cat
y1 <- c(0, 0, 1, 1) # dog, dog, cat, cat
ModelMetrics::logLoss(y0, p) == ModelMetrics::logLoss(y1, p)

Both log-loss values are the same.

The "Results on Real Binary Classification Datasets" section sure makes it look like we can just take a vector of labels and a vector of claimed probability values; calculate the log-loss from the claimed probability values and labels; and perfectly reconstruct the labels, as implied by the, "As our attacks construct…" sentence, yet that is impossible for the p that I gave. There are multiple plausible label vectors (y0 and y).

ICML is one of the top conferences in machine learning, so I do not want to be too quick to dismiss the Aggarwal paper. Nonetheless, I seem to have developed a counterexample.

What's going on?

EDIT

After seeing some answers posted, their claim sounds more plausible, but it still seems to have counterexamples. Consider their example after theorem 1, where $L_v(\sigma)$ is the log loss for true labels $\sigma$ and predictions $v$.

As an example, for N = 5, let v = [2, 3, 5, 7, 11]. Suppose
the true labeling is [0, 1, 1, 0, 1]. Then, the adversary
observes $L_v(\sigma) = 1/5
\ln
( 2304/
55
)$

(obtained by plugging in $v$
and $\sigma$ in Equation 2). For reconstructing the labels, observe
that $T = 3\times4\times6\times8\times12 = 6912$, so that all we need is to
compute primes that divide $T \exp (−NL_v(\sigma)) = 165 =
3 \times 5 \times 11$
. This tells us that only the labels for the second,
third and fifth datapoints must be 1, which is indeed true.

However, I get multiple possible $\sigma$ values that correspond to the same log-loss value.

library(ModelMetrics)

p  <- c(2, 3, 5, 7, 11)
sigma0 <- c(0, 1, 1, 0, 1) # dog, cat, cat, dog, cat
sigma1 <- c(0, 0, 1, 1, 1) # dog, dog, cat, cat, cat
ModelMetrics::logLoss(sigma0, p) == ModelMetrics::logLoss(sigma1, p)

EDIT 2

Yes, I did make a mistake. This is the correct simulation, and it gives distinct loss values.

v  <- c(2, 3, 5, 7, 11) # The prime number code from the paper
p <- v/(1 + v) # Transform to probability values
sigma0 <- c(0, 1, 1, 0, 1) # dog, cat, cat, dog, cat
sigma1 <- c(0, 0, 1, 1, 1) # dog, dog, cat, cat, cat
ModelMetrics::logLoss(sigma0, p) == ModelMetrics::logLoss(sigma1, p) # Unequal!

Best Answer

Take another look at section 3.1 of the paper.

The 'probability' is not real model output, but a code to hack the response.

enter image description here

Related Question