Multivariate Distribution – How to Generate Data from a Particular Multivariate Distribution?

conditional probabilitydistributionsmultivariate analysisprobabilitysampling

Let $(X_0,X_1)$ be a random vector distributed according to the CDF $F_{(X_0,X_1)}(x,y)= \min (F_{X_0}(x),F_{X_1}(y))$ where $F_{X_0}(x),F_{X_1}(x)$ are the CDFs of $X_0,X_1$ respectively. We do not necessarily have densities.

I know how to sample from a one-dimensional CDF using inverse transform sampling. I want to sample from $F_{(X_0,X_1)}$

If $F_{(X_0,X_1)}$ had a density, then I could see how this might go:

Generate $x_0$ from $F_{X_0}$
Compute conditional density of $X_1$ given $X_0=x_0$,
and generate $x_1$ from this density

But, here $F_{(X_0,X_1)}$ does not have a density.

What to do ?

Best Answer

If $$\mathbb P(X_0\le x,X_1\le y)=\min \{F_{X_0}(x),F_{X_1}(y)\}$$ then $$\mathbb P(F_0(X_0)\le F_0(x),F_1(X_1)\le F_1(y))=\min \{F_{X_0}(x),F_{X_1}(y)\}$$ Denoting $$U_0=F_0(X_0),\quad U_1=F_1(X_1)$$ this implies that $$\mathbb P(U_0\le u_0,U_1\le u_1) = \min(\mathbb P(U_0\le u_0,u_0<U_1\le u_1) =u_0,u_1)$$ and therefore that, when $u_1\ge u_0$, $$\mathbb P(U_0\le u_0,U_1\le u_1) = u_0$$ meaning that $$\mathbb P(U_0\le u_0,U_1\le u_1) - \mathbb P(U_0\le u_0,U_1\le u_0) = \mathbb P(U_0\le u_0,u_0<U_1\le u_1) =0$$ Similarly, when $u_0>u_1$ $$\mathbb P(u_1<U_0\le u_0,U_1\le u_1) =0$$ So $U_1$ cannot take values larger than $U_0$ and conversely, which implies that$$U_0=U_1$$with probability one.

Conclusion: To draw from this distribution,

generate a Uniform variate $U_0$
transform $U_0$ into $X_0=F_0^{-1}(U_0)$
transform $U_0$ into $X_1=F_1^{-1}(U_0)$

Related Solutions

Time Series – How to Generate Uniformly Distributed Samples with a Given Autocorrelation Function

This problem can be approached by first generating samples from the desired distribution and then reordering them to match the desired autocorrelation function.

The R code below demonstrates an approach based on this answer that can be modified for any desired ACF and distribution. The example generates $n=10^6$ samples from $\text{Gamma}(0.9,1)$ whose ACF follows 50 random samples from $\text{Beta}(1,3)$, sorted descending.

The process is as follows.

Generate the desired number of samples, $X=\{x_1,...,x_n\}$, from the target distribution. Set $\alpha_0$ equal to the desired ACF. Initialize the target ACF, $\alpha$, as the desired ACF.
Find a set of weights that, when passed to filter along with $n$ random normal variates, results in a series, $Y$, with $ACF=\alpha$ (see the answer linked above).
Reorder $X$ so that its rank ordering matches that of $Y$. If $X$ is normally distributed, the resulting series should have the desired ACF; however, the more $X$ deviates from normality, the more the ACF will deviate from $\alpha_0$ (the example below has a target distribution of $\text{Gamma}(0.9,1)$, which is very "non-normal"). Update the target ACF, $\alpha$, according to $\alpha'=\frac{\alpha}{2}\Big(\frac{\alpha_0}{ACF}+1\Big)$ and repeat steps 1-3 until the ACF of the reordered $X$ converges.

The function that performs the reordering (it works only for positive values for alpha):

acf.reorder <- function(x, alpha) {
  tol <- 1e-5
  maxIter <- 10L
  n <- length(x)
  xx <- sort(x)
  y <- rnorm(n)
  w0 <- w <- alpha1 <- alpha
  m <- length(alpha)
  i1 <- sequence((m - 1):1)
  i2 <- sequence((m - 1):1, 2:m)
  i3 <- cumsum((m - 1):1)
  tol10 <- tol/10
  iter <- 0L
  x <- xx[rank(filter(y, w, circular = TRUE))]
  SSE0 <- Inf
  f <- function(ww) {
    sum((c(1, diff(c(0, cumsum(ww[i1]*(ww[i2]))[i3]))/sum(ww^2)) - alpha1)^2)
  }
  ACF <- function(x) acf(x, lag.max = m - 1, plot = FALSE)$acf[1:m]
  
  while ((SSE <- sum((ACF(x) - alpha)^2)) > tol) {
    if (SSE < SSE0) {
      SSE0 <- SSE
      w <- w0
    }
    if ((iter <- iter + 1L) == maxIter) break
    w1 <- w0
    a <- 0
    sse0 <- Inf
    
    while (max(abs(alpha1 - a)) > tol10) {
      a <- c(1, diff(c(0, cumsum(w1[i1]*(w1[i2]))[i3]))/sum(w1^2))
      
      if ((sse <- sum((a - alpha1)^2)) < sse0) {
        sse0 <- sse
        w0 <- w1
      } else {
        # w0 failed to converge; try optim
        w1 <- optim(w0, f, method = "L-BFGS-B")$par
        a <- c(1, diff(c(0, cumsum(w1[i1]*(w1[i2]))[i3]))/sum(w1^2))
        if (sum((a - alpha1)^2) < sse0) w0 <- w1
        break
      }
      
      w1 <- (w1*alpha1/a + w1)/2
    }
    
    x <- xx[rank(filter(y, w0, circular = TRUE))]
    alpha1 <- (alpha1*alpha/ACF(x) + alpha1)/2
  }
  
  xx[rank(filter(y, w, circular = TRUE))]
}

Generate samples from the target distribution and specify the desired ACF:

set.seed(1960841256)
x <- rgamma(1e6, 0.9, 1)
alpha <- c(1, sort(rbeta(50, 1, 3), TRUE))

Reorder x and plot its ACF against alpha:

system.time(x <- acf.reorder(x, alpha))
#>    user  system elapsed 
#>    7.13    0.41    7.53
acf(x, lag.max = length(alpha) - 1)
lines(seq_along(alpha) - 1, alpha, col = "green")

The resulting ACF is a good match with the target, and since x has simply been reordered, it is known to have the desired distribution.

Update: Worst of 100 iterations

The original code was breaking too early for some cases as noted by the OP in the updated question. The above code has been modified to correct the behavior. I also removed the restriction that the ACF be strictly positive.

I ran the same procedure 100 times, each with a new random vector for x and alpha. The longest-running iteration took 7.36 seconds, and the worst performing iteration (by maximum absolute difference of the achieved vs. desired ACF) had a maximum absolute error of 0.056. It is plotted below.

Best Answer

Related Solutions

Time Series – How to Generate Uniformly Distributed Samples with a Given Autocorrelation Function

Update: Worst of 100 iterations

Related Question