R – How to Generate Correlated Test Data with Bernoulli, Categorical, and Continuous Vectors

categorical datamultivariate analysisrrandom-generation

I'm looking to generate a set of 5 random variables and enforce a dependence structure between them and onto a dependent variable $Y$. I understand how to generate correlated random variables for multivariate normal, but not when mixing different types. Below is a little more than I need, but I'm hoping someone can give me a general way of solving this problem…

  • $X_1$ and $X_2$ need to be highly correlated Bernoulli variables.
  • $X_3$ needs to take one of 5 categorical values, call them "A"…"E".
  • $X_4$ needs to be normal, and negatively correlated with $X_1$, $X_2$.
  • $X_5$ needs to approximate test scores from $0$ to $100$ with a high skew, so gamma probably. $X_5$ needs to be positively correlated with $X_1$, $X_2$, $X_4$.

Each of these variables must impact a "success/occurrence" Bernoulli distributed variable $Y$.

How would I begin? I would like to enforce correlation both between the values of $X$, and also between each $X$ and $Y$. (The categorical correlations seem particularly confusing to me.)

Best Answer

Using copulas is one way of generating dependent or (rank) correlated data from multivariable distributions that are not necessarily normal. Here is a simple example of doing this in Matlab: Simulating Dependent Random Variables Using Copulas. I am not sure if this can handle categorical variables though.

Related Question