[Math] How is the “conformal prediction” conformal

cv.complex-variablesit.information-theorypr.probabilityreference-requestst.statistics

The question is clarified by Prof.V.Vovk. See his answer below for discussion.

Recently, early works of Gammerman, Vanpnik and Vovk[4] are rediscovered by Wasserman et.al[1] and proposed it as a promising candidate towards distribution-free inference coming along with confidence level guarantee.

Given the current literature, especially [2,3], the main improvement in the conformal inference(implemented in support vector machine) is providing a confidence region with certain credibility. For example, in the classical regression model we can obtain the pointwise prediction interval at each of observation sample points. The CP

…Unlike traditional regression methods which produce point predictions,
Conformal Predictors output predictive regions that satisfy a given confidence
level.The regions produced by any Conformal Predictor are automatically valid, however their tightness and therefore usefulness depends on the nonconformity measure used by each CP….what we call in this
paper “Conformal Prediction” (CP).[2]

…We also obtain a measure of “credibility” which serves as an
indicator of the reliability of the data upon which we make our
prediction.[3]

And in their introductory book, they explained their usage of the name "Conformal Prediction" by saying

Most of this book is devoted to a particular method that we call
"conformal prediction". When we use this method, we predict that a new
object will have a label that makes it similar to the old examples in
some specified way, and we use the degree to which the specified type
of similarity holds within the old examples to estimate our
confidence in the prediction. Our conformal predictors are, in other
words, "confidence predictors".[5]pp.7-8

To be more precise and in response to @RHahn comment below, I do not think "conformal" is an arbitrary choice of word, since Vovk mentioned that

"In 1963–1970 Andrei Kolmogorov suggested a different approach to
modelling uncertainty based on information theory; its purpose was to
provide a more direct link between the theory and applications of
probability. On-line compression models are a natural adaptation of
Kolmogorov's programme to the technique of conformal prediction.
Working Paper 8 introduces three on-line compression models:
exchangeability (equivalent to the iid model), Gaussian and
Markov."[6]

Therefore I do believe there is a deeper motivation of the "conformal prediction" from complex analysis that I was not aware of. Thanks!

Therefore my question is,

(1)Given the name "Conformal prediction", is this method more or less associated with the concept of conformal mapping in (multivariate) complex analysis?

Does it mean that the old sample can be mapped locally conformally to the new samples?
Since most of CP are implemented using SVM, is this related to the shape of classifying hyperplane determined by the SVM?

(2)(More like a opinion-based question) How is conformal predictor different from the existing robust predictors while they both come with a guarantee that the true value will fall into the confidence region with high probability?

Reference

[1]Lei, Jing, et al. "Distribution-free predictive inference for regression." Journal of the American Statistical Association just-accepted (2017).

[2]Papadopoulos, Harris, Vladimir Vovk, and Alexander Gammerman. "Regression conformal prediction with nearest neighbours." Journal of Artificial Intelligence Research 40 (2011): 815-840.

[3]Saunders, Craig, Alexander Gammerman, and Volodya Vovk. "Transduction with confidence and credibility." Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI'99). Vol. 2. 1999.

[4]Shafer, Glenn, and Vladimir Vovk. "A tutorial on conformal prediction." Journal of Machine Learning Research 9.Mar (2008): 371-421.

[5]Vovk, Vladimir, Alexander Gammerman, and Glenn Shafer. Algorithmic learning in a random world. Springer Science & Business Media, 2005.

[6]http://www.vovk.net/cp/index.html

Best Answer

Thanks for your interest. The term “conformal prediction” was suggested by Glenn Shafer, and at first I did not like it exactly for the reason that you mention: it has nothing (or very little) to do with conformal mappings in complex analysis. But then I discovered other meanings, even in maths; e.g., Wikipedia has five on its disambiguation page for “conformal”:

  • Conformal film on a surface (same thickness)
  • Conformal fuel tanks on military aircraft
  • Conformal coating in electronics
  • Conformal hypergraph, in mathematics
  • Conformal software, in ASIC Software

So the word did not look taken to me anymore. The expression that we had used before Glenn proposed “conformal prediction” was even worse (“transductive confidence machine”).

Thanks to Hengrui Luo for drawing my attention to this question.

As for question (2), the answer depends on which robust predictors you have in mind. The predictors with most similar properties are the ones in classical statistics (such as the standard prediction intervals in linear regression based on Student's t distribution); the main difference is that they are parametric. There is a predictive version of tolerance intervals in nonparametric statistics, but their treatment of objects (x parts of observations (x,y), where y are labels) is limited. Upper bounds on the probability of error given by standard PAC predictors are often too high to be useful.

Related Question