r – How to Transition from Joint Distributions to Conditional Distributions in R

distributionsnormal distributionprobabilityr

I was reading the following blog post: https://fabiandablander.com/statistics/Two-Properties.html

Suppose you have a 2 Dimensional (Bivariate) Joint Normal Distribution :

enter image description here

Where:

enter image description here

According to the theory (for multivariate gaussian distribution only), the conditional distribution of X1 given X2 is:

enter image description here

Question: Is it possible to obtain the conditional distribution of the joint distribution using R?

If I have a bivariate normal distribution (e.g. assume no covariances) :

library(bivariate)

f = bmbvpdf (
    mean.X1=10, mean.X2=5,
    
    sd.X1=3 ,   sd.X2=4,
)

plot(f)

enter image description here

Is is possible to use R (or some statistical software) to find out the conditional distribution of X1, i.e. the probability distribution of "X1 given X2"? Is it possible to output the parameters of this distribution (i.e. the mean and the variance) in a matrix or data frame?

I understand that for a simple bivariate normal distribution, this can be done by hand – but for high dimensional multivariate distributions, if you know the exact joint probability distribution : Using R, can you easily determine the conditional distribution from a joint distribution. For example, if you have a multivariate guassian mixture distribution that looks something like this:

enter image description here

If you have the joint probability distribution as a function F(X1,X2,X3, X4, X5) : Can you determine F(X1 | X2,X3, X4, X5) using R?

Thanks!

Best Answer

R is not a symbolic computing language like Mathematica, so it won't output functional forms or expected values etc. If all you want is numerical values, though, just

  1. Numerically integrate $f(x_1, \dots, x_5)$ over the range of $x_1$ to get the constant of integration,
  2. Calculate $f(x_1, \dots, x_5)$ as you normally would,
  3. Divide the result of step 2 by the result of step 1 to get the conditional density.

Other values, e.g., the mean, can be calculated in the obvious way by replacing the calculation in step 2 with the appropriate function times $f(x_1, \dots, x_5)$.

For example:

library(VGAM)

# Bivariate normal parameters
mu <- c(0,1)
v <- c(1,2)
covar <- 0.3

# Condition on x2 = 3
x2 <- 3

# Step 1: Integrate over x1 to get the COI
#    The first parameter of dbinorm is x1, and that's what the integrate
#    function integrates over.  We have to pass all the rest of the 
#    parameters to the integrate function, which passes them on to dbinorm

coi <- integrate(dbinorm, lower=-5, upper=5, x2=x2, mean1=mu[1], mean2=mu[2], var1=v[1], var2=v[2], cov12=covar)$value
    
# Step 2: Define conditional distribution (not strictly necessary)
#    From a programming standpoint, there are better and more elegant ways, but this
#    has the advantage of clarity for less experienced R programmers

dcond <- function(x) {
   dbinorm(x, x2=x2, mean1=mu[1], mean2=mu[2], var1=v[1], var2=v[2], cov12=covar) / coi
}

# Calculate the mean using numerical integration
integrate(function(x){x*dcond(x)}, lower=-5, upper=5)
0.2999965 with absolute error < 7.7e-09

# Calculate the mean using known formulae
correl <- covar / sqrt(v[1] * v[2])
correl * sqrt(v[1]/v[2])*(x2-mu[2])
[1] 0.3

... and, as we hope, the numerical integration calculation gives us essentially the same value for the conditional mean that the (more) exact calculation does.

Related Question