Solved – Method for generating correlated non-normal data

copulamultivariate analysisrandom-generationreferencesskewness

I'm interested in finding out a method for generating correlated, non-normal data. So ideally some sort of distribution that takes in a covariance (or correlation) matrix as a parameter and generates data that approximates it. But here's the catch: the method I'm trying to find should have the flexibility to also control its multivariate skewness and / or kurtosis.

I'm familiar Fleishman's method and the use of the power method of normal variates, but I believe most of those extensions only allow the user for certain combinations of marginal skewness and kurtosis, leaving the multivariate skewness / kurtosis just out there. What I was wondering is if there is a method that helps specify the multivariate skewness and / or kurtosis, alongside with some correlation / covariance structure.

About a year ago I took a seminar on copula distributions and I remember the professor casually mentioning that through the use of vine copulas, one could generate data which is, say, symmetric in each one of its 1-D marginals but jointly skewed and vice-versa. Or, even further, that any lower-dimensional margins could have some skewness or kurtosis while keeping the highest dimensions symmetric (or not). I was marveled by the idea that such flexibility could exist I've been trying to find some sort of article or conference paper that describes said method but I have been unsuccessful :(. It doesn't have to be through the use of copulas, I'm open to anything that works.

Edit: I have added some R code to try to show what I mean. So far I am only well-acquainted with Mardia's definition of multivariate skewness and kurtosis. When I first approached my problem I naively thought that if I used a symmetric copula (Gaussian in this case) with skewed marginals (beta, in this example), univariate tests on the marginals would yield significance but Mardia's test for multivarite skewness/kurtosis would be non-significant. I tried that and it didn't come out as I had expected:

library(copula)
library(psych)
set.seed(101)

cop1 <- {mvdc(normalCopula(c(0.5), dim=2, dispstr="un"), 
            c("beta", "beta"),list(list(shape1=0.5, shape2=5), 
            list(shape1=0.5, shape2=5)))}

            Q1 <- rmvdc(cop1, 1000)
            x1 <- Q1[,1]
            y1 <- Q1[,2]


cop2 <- {mvdc(normalCopula(c(0.5), dim=2, dispstr="un"), 
            c("norm", "norm"),list(list(mean=0, sd=1), 
            list(mean = 0, sd=1)))}

            Q2 <- rmvdc(cop2, 1000)
            x2 <- Q2[,1]
            y2 <- Q2[,2]

mardia(Q1)  

Call: mardia(x = Q1)

Mardia tests of multivariate skew and kurtosis
Use describe(x) the to get univariate tests
n.obs = 1000   num.vars =  2 
b1p =  10.33   skew =  1720.98  with probability =  0
small sample skew =  1729.6  with probability =  0
b2p =  22.59   kurtosis =  57.68  with probability =  0

mardia(Q2)
Call: mardia(x = Q2)

Mardia tests of multivariate skew and kurtosis
Use describe(x) the to get univariate tests
n.obs = 1000   num.vars =  2 
b1p =  0.01   skew =  0.92  with probability =  0.92
 small sample skew =  0.92  with probability =  0.92
b2p =  7.8   kurtosis =  -0.79  with probability =  0.43

Upon inspecting the contours for 'cop1' VS 'cop2' as well as the empirical bivariate density plots, I can also see that none of them look symmetric at all. That's when I realized this is probably a little more complicated than I thought.

I know that Mardia's is not the only definition of multivariate skewness/kurtosis, so I'm not limiting myself to finding a method that only satisfies Mardia's definitions.

thank you!

Best Answer

After much searching, jumping around online forums, consulting with professors and doing A LOT of literature review, I have come to the conclusion that probably THE only way to address this problem is through the use of vine copulas indeed. It gives you some control over the pairwise skewness and kurtosis (or any higher moments) - for a p-variate random vector and the freedom to specify p-1 pair of copulas and the remaining p*(p-1)/2 - (p-1) dimensions can be specified in some kind of conditional copula.

I welcome other methods people might've come across but at least I'm going to leave this pointer towards an answer because i cannot, for the life of me, find any other ways to address this.

Related Question