Solved – Why don’t we see Copula Models as much as Regression Models

copulajoint distributionmarkov-chain-montecarloprobabilityr

Is there any reason that don't see Copula Models as much as we see Regression Models (e.g. https://en.wikipedia.org/wiki/Vine_copula, https://en.wikipedia.org/wiki/Copula_(probability_theory)) ?

I have spent the last few months casually reading about applications of Copulas. As I understand, Copulas allow you to create a joint probability distribution for several variables – and each of these variables need not have the same marginal class of probability distribution. For example : A Copula could be made to create a joint probability distribution of variables X1 and X2, where X1 is a Normal Distribution and X2 is an Exponential Distribution. Allegedly, this is quite useful for modelling complex and irregular real world phenomena that do not fully conform to "homogeneous and common" probability distributions.

In terms of applications, I have heard that Copula Models (i.e. the joint probability distribution produced by a Copula Model) can be used for a different tasks involving Causal Inference and Predictive Modelling. Since Copula Models are after all joint probability distributions, we can use MCMC Sampling to generate random samples from a relevant conditional probability distribution – and the mean and variance of these randomly generated samples from the desired conditional distribution can be thought of as the "predicted value" for a new observation (effectively performing the role of a regression model).

I have read the Copula Models are often used in the financial industry to model correlations and risk in financial markets, and instances where they are used in Survival Analysis for modelling dependencies in Survival Times – but apart from this, they do not seem to be nearly as widespread as standard regression models.

My Question: Does anyone know why this is?

  • My first guess as to why Copula Models are less widespread compared to Regression Models, is that the framework and mathematics required in Copulas is arguably far more complex compared to Regression Models. Thus, the potential benefits of Copula Models are never fully realized due to the complexity of the mathematics required in understanding them.

  • My second guess as to why Copula Models are less widespread compared to Regression Models, is that far fewer software implementations exist for Copula Models compared to Regression Models. For example, I have seen some popular R packages that can be used for Copula Models (e.g. https://cran.r-project.org/web/packages/copula/copula.pdf , https://cran.r-project.org/web/packages/VineCopula/index.html , https://www.jstatsoft.org/article/view/v077i08 ) – yet these packages mainly seem to concern themselves with "fitting" the Copulas, and do not focus as much on how to use Copulas for prediction purposes (in the same context as one would use Regression Models). I came across an R package that allows for fitting Conditional Copulas (e.g. https://cran.r-project.org/web/packages/CDVineCopulaConditional/index.html), but it seems strange that this package requires you to fit a new Conditional Copula to the data according to your specifications – and does not allow you to generate random samples from an existing Copula.

Thus, are my assessments reasonable? Could these partly explain why Copula Models are not as widespread as traditional Regression Models?

Can someone please comment on this?

Best Answer

The first and most important reason is that standard regression models had a one to two-hundred year headstart on copula models (depending on exactly where you count the genesis of regression models and copula models). Any explanation is the disparity in usage is going to have to start there.

The method of least-squares estimation for fitting functions through data was developed in the early nineteenth century by Legendre and Gauss, and the Gauss-Markov theorem was published by Gauss in 1821. By the late nineteenth century the term "regression" had come into use to describe the narrow phenomenon of regression to the mean, but it was developed further at the end of the nineteenth century in a form that is a clear precursor to the modern theory. In particular, Yule gave a close precursor to the modern regression model in Yule (1897) and Fisher had developed and analysed the standard Gaussian regression model that is used today no later than Fisher (1922).

Contrarily, copulas were first introduced into statistics in Sklar (1959) and were developed further over later decades. The initial mathematical result underpinning the field was a "folk theorem" for over a decade, until it was proved by multiple authors in the 1970s. The first statistical conference looking at copulas didn't occur until 1990 and even after this, copulas were only really applied in the field of finance. ​ Copula models did not really become widely visible in the statistics profession until about the turn of the twenty-first century, when Li (2000) popularised them in a seminal article in finance. It is probably only in the last two to three decades that copulas have become broadly known even within the statistical profession. As you point out, the copula theory is mathematically more complex, but it is also much, much younger.

Statistical theories and models tend to start out with narrow usage confined to scholars in the field and then --- if they have sufficient value--- they expand out to be used more widely by various professionals in a broader range of applied fields. It is not until they become sufficiently widely used in the professions that universities decide it is worth teaching those models in their regular courses. In the present case, copula models are about twenty years old and they have probably only started being taught in the universities in the last ten years (and at some universities not yet at all). You only have to go back about a decade and statistical students at a university would not even have heard of copula models (unless they ran into them as a speciality) and would not have had any courses that taught it.

So, if you are a statistician/econometrician and you are over forty, you probably will not have learned about copula models unless you have personally gone out of your way to self-learn it outside of your university education. However, you will have had at least a few courses that covered regression modelling, GLMs, etc., and you will have had to implement these models regularly as a student in order to complete your degree. If you are a psychologist or scientist over forty, you almost certainly never learned copula models, but you probably would have encountered regression models in your university training. This has a huge impact on the respective level of usage of the two models in subsequent professional work.

Related Question