Conformal Prediction

Conformal prediction uses past experience to determine precise levels of confidence in new predictions. Given an error probability ⚠ $\epsilon$, together with a method that makes a point prediction of a label ⚠ $y$, it produces a set of labels, typically containing the point prediction, that also contains {⚠ $y$} with probability ⚠ $1-\epsilon$. Conformal prediction can be applied to any method for producing point predictions: the nearest neighbours method, support vector machines, ridge regression, etc.

Conformal prediction is designed for the on-line setting, in which labels are predicted successively, each one being revealed before the next is predicted. The most novel and valuable feature of conformal prediction is that if the successive examples are sampled independently from the same distribution (randomness assumption), then the successive predictions will be right ⚠ $1-\epsilon$ of the time, even though they are based on an accumulating data set rather than on independent data sets.

The main classes of algorithms in conformal prediction and their variations are (the classes listed below are not disjoint):

Variations of conformal predictors adapted to probability forecasting include:

Conformal predictors, and related methods, can be used in environments that are more challenging than the on-line prediction protocol under the randomness assumption. This includes:

batch prediction protocol
weak teachers
on-line compression models, including, in addition to the randomness assumption, Gauss linear model and Markov model

An interesting application of conformal prediction is to testing the randomness assumption (or a different on-line compression model).

There is some software implementing various methods of conformal prediction.

For predecessors of conformal prediction, see:

Alexey Chervonenkis
Andrei Kolmogorov's work on "IID deficiency"
Kei Takeuchi

Some open problems for conformal prediction:

Efficiency of exchangeability martingales: how efficient are conformal exchangeability martingales (more generally, exhchangeability martingales) as a tool for testing the assumption of randomness?
IID vs exchangeability: what is the relation between the IID and exchangeability assumptions for general observation spaces?
Statistical and on-line compression modelling: relation between standard statistical modelling and on-line compression modelling.
Universality of conformal exchangeability martingales: do the conformal exchangeability martingales exhaust the class of exchangeability martingales?

Bibliography

Vineeth N. Balasubramanian, Shen-Shyang Ho, and Vladimir Vovk, editors (2014). Conformal Prediction for Reliable Machine Learning: Theory, Adaptations, and Applications. Morgan Kaufmann, Chennai.
Vladimir Vovk, Alexander Gammerman, and Glenn Shafer (2005). Algorithmic learning in a random world. Springer, New York.
Vladimir Vovk, Alexander Gammerman, and Glenn Shafer (2022). Algorithmic learning in a random world (Second Edition). Springer, Cham.
This question and answers to it on mathoverflow discuss the name "conformal prediction".