Solved – Why don’t we see Copula Models as much as Regression Models

copulajoint distributionmarkov-chain-montecarloprobabilityr

Is there any reason that don't see Copula Models as much as we see Regression Models (e.g. https://en.wikipedia.org/wiki/Vine_copula, https://en.wikipedia.org/wiki/Copula_(probability_theory)) ?

I have spent the last few months casually reading about applications of Copulas. As I understand, Copulas allow you to create a joint probability distribution for several variables – and each of these variables need not have the same marginal class of probability distribution. For example : A Copula could be made to create a joint probability distribution of variables X1 and X2, where X1 is a Normal Distribution and X2 is an Exponential Distribution. Allegedly, this is quite useful for modelling complex and irregular real world phenomena that do not fully conform to "homogeneous and common" probability distributions.

In terms of applications, I have heard that Copula Models (i.e. the joint probability distribution produced by a Copula Model) can be used for a different tasks involving Causal Inference and Predictive Modelling. Since Copula Models are after all joint probability distributions, we can use MCMC Sampling to generate random samples from a relevant conditional probability distribution – and the mean and variance of these randomly generated samples from the desired conditional distribution can be thought of as the "predicted value" for a new observation (effectively performing the role of a regression model).

I have read the Copula Models are often used in the financial industry to model correlations and risk in financial markets, and instances where they are used in Survival Analysis for modelling dependencies in Survival Times – but apart from this, they do not seem to be nearly as widespread as standard regression models.

My Question: Does anyone know why this is?

My first guess as to why Copula Models are less widespread compared to Regression Models, is that the framework and mathematics required in Copulas is arguably far more complex compared to Regression Models. Thus, the potential benefits of Copula Models are never fully realized due to the complexity of the mathematics required in understanding them.
My second guess as to why Copula Models are less widespread compared to Regression Models, is that far fewer software implementations exist for Copula Models compared to Regression Models. For example, I have seen some popular R packages that can be used for Copula Models (e.g. https://cran.r-project.org/web/packages/copula/copula.pdf , https://cran.r-project.org/web/packages/VineCopula/index.html , https://www.jstatsoft.org/article/view/v077i08 ) – yet these packages mainly seem to concern themselves with "fitting" the Copulas, and do not focus as much on how to use Copulas for prediction purposes (in the same context as one would use Regression Models). I came across an R package that allows for fitting Conditional Copulas (e.g. https://cran.r-project.org/web/packages/CDVineCopulaConditional/index.html), but it seems strange that this package requires you to fit a new Conditional Copula to the data according to your specifications – and does not allow you to generate random samples from an existing Copula.

Thus, are my assessments reasonable? Could these partly explain why Copula Models are not as widespread as traditional Regression Models?

Can someone please comment on this?

Best Answer

The first and most important reason is that standard regression models had a one to two-hundred year headstart on copula models (depending on exactly where you count the genesis of regression models and copula models). Any explanation is the disparity in usage is going to have to start there.

The method of least-squares estimation for fitting functions through data was developed in the early nineteenth century by Legendre and Gauss, and the Gauss-Markov theorem was published by Gauss in 1821. By the late nineteenth century the term "regression" had come into use to describe the narrow phenomenon of regression to the mean, but it was developed further at the end of the nineteenth century in a form that is a clear precursor to the modern theory. In particular, Yule gave a close precursor to the modern regression model in Yule (1897) and Fisher had developed and analysed the standard Gaussian regression model that is used today no later than Fisher (1922).

Contrarily, copulas were first introduced into statistics in Sklar (1959) and were developed further over later decades. The initial mathematical result underpinning the field was a "folk theorem" for over a decade, until it was proved by multiple authors in the 1970s. The first statistical conference looking at copulas didn't occur until 1990 and even after this, copulas were only really applied in the field of finance. Copula models did not really become widely visible in the statistics profession until about the turn of the twenty-first century, when Li (2000) popularised them in a seminal article in finance. It is probably only in the last two to three decades that copulas have become broadly known even within the statistical profession. As you point out, the copula theory is mathematically more complex, but it is also much, much younger.

Statistical theories and models tend to start out with narrow usage confined to scholars in the field and then --- if they have sufficient value--- they expand out to be used more widely by various professionals in a broader range of applied fields. It is not until they become sufficiently widely used in the professions that universities decide it is worth teaching those models in their regular courses. In the present case, copula models are about twenty years old and they have probably only started being taught in the universities in the last ten years (and at some universities not yet at all). You only have to go back about a decade and statistical students at a university would not even have heard of copula models (unless they ran into them as a speciality) and would not have had any courses that taught it.

So, if you are a statistician/econometrician and you are over forty, you probably will not have learned about copula models unless you have personally gone out of your way to self-learn it outside of your university education. However, you will have had at least a few courses that covered regression modelling, GLMs, etc., and you will have had to implement these models regularly as a student in order to complete your degree. If you are a psychologist or scientist over forty, you almost certainly never learned copula models, but you probably would have encountered regression models in your university training. This has a huge impact on the respective level of usage of the two models in subsequent professional work.

Related Solutions

Solved – Copulas with Regression

In my opinion the two methods (copula, regression) answer quite different questions. The copula approach is much more general than regression and one of the reasons why you have not seen regression models based on copulas, might be that using copulas is much harder than using regression. Two observations why this is so:

For a copula fit you need to know or estimate the joint distribution of all variables involved. You do not need this for regression.
If you are only interested in the response, regression gives you the answer more or less directly. But from the joint distribution you need to manufacture the conditional expectation of the response with additional effort.

This extra effort for estimating the joint distribution and only then finding the expected response would need to be justified by the specific problem you are interested in. Two justifications I can think of are: You are actually interested in the joint distribution (that is what you called "traditionally") or you know that your model does not allow for the standard assumptions of regression (additive independent errors, say).

On your questions 1. and 2.: Sure you can do this in theory (if the copula is differentiable and has a density). If you know the joint distribution, you can calculate all marginals and conditional expectations. The problems start when you want to estimate this from data. Unless your problem prescribes a specific, nice parametric copula, you might need special samples or lots of them to do this.

Solved – Change Point detection with R and Python leads to different results

Surprisingly no answers were given yet. Here I tried to offer some biased opinions from my experiences with changepoint detection.

First of all, translating code from one lang to another is often tricky and error-prone. One example highlighting the difficulty is the reimplementation of a change detection algorithm called LandTrend, ported from IDL (an interactive lang similar to R and Python) to Java (GEE); the translated code gave almost the same results as before, but NOT IDENTIICAL (https://www.mdpi.com/2072-4292/10/5/691). Regardless, such inconsistencies are unlikely to be the true reason for what you observed for the PELT method, because the code base for the PELT method is relatively small.

I suspect two reasons for your case, one concerning the ill-posedness of your problem/data and another concerning the differing numerical libraries used behind R and Python. Below are more details.

(1) Your R and Python results are very close, which indicates your data/problem has multiple near-optimal solutions close to each other. Any minuscule numerical errors or data errors (e.g., slightly disturbing a datapoint with a very small noise) may shift the detected 'optimal' solution from one to another. Here is a made-up example to further explain. Suppose that the PELT algorithm tried to maximize a criterion; the result [110, 120, 140, 160, 195, 255] has a theoretical value of 0.4312 (I just made up this number), and the result [108 120 140 161 192 253] has a theoretical value of 0.4311. The two are very close. if you have a perfect computer with no numerical error, you can pick up the true best one (the one with 0.4312). But with all kinds of numerical errors such as round-off, truncation, and limited machine precisions, the algorithm may pick up either of them because, NUMERICALLY, the theoretically best one might have a worse optimized value than the other near-optimal ones. In some literature, this is known as model equifinality. In reality, there can be numerous solutions (more than two as explained here) that are almost equally good. I touched this problem briefly in a publication of mine (Figure 1 at https://go.osu.edu/beast2019).

(2) On top of the problem explained in (1), more often than not, Python and R use different math libraries (I mean, the blas and lapack libs for basic matrix and vector math operations and linear algebra). For example, by default, R uses the legacy fotran implementation, although other alternatives (e.g., Intel's MKL, and openBlas) can be customarily linked. The different libraries (plus when compiled for different CPUs or with different compiler flags) do not give identical results, despite that the results are sufficiently close in terms of machine precision. In the changepoint detection algorithm I developed (called Rbeast and available at https://github.com/zhaokg/Rbeast), I implemented my own version of blas for vector and matrix operation; the numerical results differ even on the same machine/CPU if I used different cpu instruction sets (e.g., SSE, AVX, and AVX512). Again by 'different', the results are almost the same but not identical (e.g., 0.3434313 vs 0.3434315). If accumulated throughout, these small errors can add up to be large enough to confuse the algorithm not to find the true best solution for the ill-posed problems explained in (1).

Now switching to the statistical point of view, your two solutions are probably not statistically different. If you are familiar with some model selection criteria such as AIC, a difference of AIC smaller than ~2.0 means that no statistical evidence suggests one model is better than another one. So, I assume that your Python solution and R solution should be equally good (again statistically speaking). Given this (i.e., model equifinality), Bayesian methods have been used to circumvent the problem a little bit. In R, bcp is a popular package, and my package Rbeast is also aimed to address similar problems. Here are some quick runs on your data using bcp and Rbeast.

data = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 2, 1, 0, 0, 1, 0, 1, 0, 2, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0, 0, 0, 1, 2, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 2, 1, 0, 2, 3, 0, 0, 0, 3, 1, 0, 0, 2, 0, 3, 1, 2, 3, 3, 3, 0, 1, 1, 2, 1, 1, 3, 1, 0, 2, 3, 5, 0, 1, 1, 1, 3, 3, 1, 0, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 11, 6, 6, 8, 6, 12, 11, 7, 9, 12, 21, 11, 28, 20, 20, 15, 26, 20, 22, 22, 15, 15, 13, 23, 15, 16, 11, 20, 17, 21, 10, 8, 9, 11, 7, 6, 10, 4, 4, 7, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 16, 13, 13, 4, 4, 14, 14, 11, 7, 9, 11, 16, 15, 5, 8, 17, 10, 12, 7, 13, 19, 21, 7, 14, 13, 11, 9, 18, 9, 15, 8, 4, 3, 2, 1, 2, 1, 2, 4, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 3, 1, 1, 0, 0, 1, 1, 0, 0, 2, 0, 0, 0, 0, 1, 0, 1, 1, 1, 3, 1, 2, 1, 1, 2, 9, 2, 10, 5, 1, 7, 7, 5, 9, 12, 8, 14, 10, 10, 12, 10, 1, 1, 2, 2)

library(bcp)
out = bcp(data)
plot(out)


library(Rbeast)

# Rbeast do time series decomposition and changepoint detection altogether. 
# season='none' is set below to indicate data has no periodic/seasonal variation.

out = beast(data,season='none') 
print(out)
plot(out)

The first figure is from bcp and the second from Rbeast. The posterior probability curves (e.g., Pr(tcp)) shows the probability of changepoint occurrence.

Best Answer

Related Solutions

Solved – Copulas with Regression

Solved – Change Point detection with R and Python leads to different results

Related Question