[Math] Monte Carlo sampling from correlated empirical distributions

simulationst.statistics

I have a dataset that contains six correlated variables, and I want to sample new data so that each variable has the same marginal distribution as the original data, and the correlations are also the same.

Unfortunately, the marginal distributions are irregular, so I can't use any standard distributions or procedures. I'm generating a kernel for each one and using the resulting empirical CDF.

How can I sample from the joint distribution?


Additional information:

The data represent different measures of an electricity networks reliability; they are correlated because a "bad day" may mean higher values on multiple measures. The method I've been trying is as follows:

  1. Find the Cholesky decomposition of the correlation matrix of the original data
  2. Generate standard normal variates and multiply them by the Cholesky decomposition to get correlated normal variates.
  3. Generate the kernels of the original data. Generate a large number of observations (100m) fitting this kernel distribution. Order these observations.
  4. For each data point, find the CDF of the normal variate, and then use the ordered kernel observations to select the observation corresponding to this CDF value – the value of this observation is our new variate from the marginal distribution.

The resulting data have the desired marginal distributions, but the variables don't have the correct correlations. Not sure why…

Best Answer

(Sorry, can't comment - these are just clarifying questions)

1a) What is the goal? To start from some common ground, if all you have is a bunch of 6-tuples, then that is all you have - you can't sample from 'the actual distribution'. My guess is that you have a model (in this case, multivariate normal) and you want to find the best estimates for the parameters in some metric (I really don't know enough to guess what this is here). As stated, it is hard for me to understand what you are going for.

1b) Just to be concrete, what are you hoping to get out of this that you can't get by just sampling-with-replacement from the list of 6-tuples that you already have (this is the same as sampling according to the empirical kernel, if I understood that word correctly)?

2) Just to make sure, the 'kernel of the data' is some standard estimate of the transition kernel for this 6-state markov chain? This is probably a silly interpretation.

Related Question