Can any Probability Distribution be Integrated

distributionsnormal distributionprobabilityr

Can any Probability Distribution be Integrated?

Suppose I have a Mixture Model. Mixture Models can be considered as a "linear combination of probability distribution functions". Combining several probability distribution functions together allows Mixture Models to "explain" complex patterns within the data that a single probability distribution would not be able to explain as easily. For example, different Normal Distributions can be combined together to make a Mixture Model (note: contrary to common logic, a mixture of two Normal Distributions will not necessarily result in a Normal Distribution):

Suppose I want to create the following Mixture Model by combining two Normal Distributions:

**P(x,y,z) = 0.3 * N1 + 0.7 * N2

N1(x,y,z) ~ Normal with mu = (1,1,1) and sigma(1,0,0,0,1,0,0,0,1)

N2(x,y,z) ~ Normal with mu = (1,0,1) and sigma(1,0,0,0,1,0,0,0,1)**

Question: If I fix the value of x = 2, can I use Monte Carlo Integration to integrate this conditional probability distribution? (over some arbitrary range)

I tried to demonstrate this using the R programming language.

Part 1: I defined the conditional distribution of this mixture model for x = 2:

#define constants needed for the multivariate normal
    sigma1 <- c(1,0,0,0,1,0,0,0,1)
      sigma <- matrix(sigma1, nrow=3, ncol= 3, byrow = TRUE)
      sigma_inv <- solve(sigma)
      sigma_det <- det(sigma)
      denom = sqrt( (2*pi)^4 * sigma_det) 

#mixture model
    target <- function(y,z)
      
    {
      x_one = 2 - 1
      x_two = y - 1
      x_three = z - 1
     
        
      x_t = c(x_one, x_two, x_three)
      x_t_one <- matrix(x_t, nrow=3, ncol= 1, byrow = TRUE)
      x_t_two =  matrix(x_t, nrow=1, ncol= 3, byrow = TRUE)
      
      
     
      num = exp(-0.5 * x_t_two  %*%  sigma_inv  %*%  x_t_one)
        
      answer_1 = num/denom
 
    
      x_one2 = 2 - 1
      x_two2 = y - 1
      x_three2 = z - 1
     
        
      x_t2 = c(x_one2, x_two2, x_three2)
      x_t_one2 <- matrix(x_t2, nrow=3, ncol= 1, byrow = TRUE)
      x_t_two2 =  matrix(x_t2, nrow=1, ncol= 3, byrow = TRUE)
      
      
      # In this part, as it's (x-mu)^T * SIGMA * (x-mu)
      
   
      num2 = exp(-0.5 * x_t_two2  %*%  sigma_inv  %*%  x_t_one2)
        
      answer_2 = num2/denom
      return(0.3*answer_1 + 0.7*answer_2)
    }

Part 2: I then attempted to integrate this conditional probability distribution using Monte Carlo Integration:

means_vec<-c()
for(j in 1:10000){
    set.seed(j)
    u_y<-runif(10000)
    u_z<-runif(10000)
    sim_vec<-c()
    for(i in 1:length(u_y)){
        sim_vec[i]<-target(y=u_y[i],z=u_z[i])
    }
    means_vec[j]<-mean(sim_vec)
}
print(means_vec)

The above code took 10,000 random points (from a uniform distribution) for "y" and "z", and evaluated the function (i.e. the conditional mixture distribution) at these 10,000 random points : then, using the Monte Carlo Estimator, the integral of this conditional normal distribution was calculated (by averaging the values of the function evaluated at these 10,000 random points). Then, this entire process is repeated 100 times : the integral of the conditional distribution can be considered as the average integral from each of these 100 iterations:

head(means_vec)

[1] 0.01124573 0.01123975 0.01123244 0.01123575 0.01124451 

#final answer of the integral
mean(means_vec)

[1] 0.01125189

Can someone please tell me if what I have done is correct? In practice, is this how the conditional distribution of a mixture models is integrated?

Best Answer

There are four little errors.

In computing the denominator you take a power of 4 (instead of 3)
You had a wrong mean vector x_two2 = y - 1
You integrate over the joint distribution f(x,y,z) instead of the conditional distribution f(x,y,z)/f(x)

In the corrected code below it is simply dividing by dnorm(1) which can be done because x is independent from y and z.
The range of integration is small (the runif functions)

#define constants needed for the multivariate normal
    sigma1 <- c(1,0,0,0,1,0,0,0,1)
      sigma <- matrix(sigma1, nrow=3, ncol= 3, byrow = TRUE)
      sigma_inv <- solve(sigma)
      sigma_det <- det(sigma)
      denom = sqrt( (2*pi)^3 * sigma_det)

#mixture model
    target <- function(y,z)
      
    {
      x_one = 2 - 1
      x_two = y - 1
      x_three = z - 1
     
        
      x_t = c(x_one, x_two, x_three)
      x_t_one <- matrix(x_t, nrow=3, ncol= 1, byrow = TRUE)
      x_t_two =  matrix(x_t, nrow=1, ncol= 3, byrow = TRUE)
      
      
     
      num = exp(-0.5 * x_t_two  %*%  sigma_inv  %*%  x_t_one)
        
      answer_1 = num/denom
 
    
      x_one2 = 2 - 1
      x_two2 = y - 0
      x_three2 = z - 1
     
        
      x_t2 = c(x_one2, x_two2, x_three2)
      x_t_one2 <- matrix(x_t2, nrow=3, ncol= 1, byrow = TRUE)
      x_t_two2 =  matrix(x_t2, nrow=1, ncol= 3, byrow = TRUE)
      
      
      # In this part, as it's (x-mu)^T * SIGMA * (x-mu)
      
   
      num2 = exp(-0.5 * x_t_two2  %*%  sigma_inv  %*%  x_t_one2)
        
      answer_2 = num2/denom
      return(0.3*answer_1 + 0.7*answer_2)
    }

means_vec<-c()
for(j in 1:10){
    set.seed(j)
    box_length = 10
    u_y<-runif(10000,-box_length/2,box_length/2)
    u_z<-runif(10000,-box_length/2,box_length/2)
    sim_vec<-c()
    for(i in 1:length(u_y)){
        sim_vec[i]<-target(y=u_y[i],z=u_z[i])
    }
    means_vec[j]<-mean(sim_vec)
}
print(means_vec*box_length^2/dnorm(1))

mean(means_vec)*box_length^2/dnorm(1)

Related Solutions

Solved – Conditional Expected Value of Product of Normal and Log-Normal Distribution

What is the intended use of the result? That bears on what form of answer is needed, to include whether a stochastic (Monte Carlo) simulation approach might be adequate, And even the bigger picture matter of is this problem necessary to solve, and did someone come up with this problem as a way of solving a higher level problem, and there might be a better approach to the higher level problem which doesn't require this.

Here is a stochastic (Monte Carlo) simulation solution in MATLAB.

a = 1; b = 2; c = 3; d = 4; k = -1; % Made up values for illustrative purpose
n = 1e8; % Number of replications
mux = 10; sigmax = 4; sigmay = 7; % Made up values for illustrative purposes
X = mux + sigmax * randn(n,1); Y = sigmay * randn(n,1); Y1 = a + b + c + d * Y;
success_index = exp(X).*Y1 > 0; % replications in which condition is true
num_success = sum(success_index);
Cond_Sample = exp(X(success_index)) .* Y1(success_index) + k;
disp([num_success mean(Cond_Sample) std(Cond_Sample)/sqrt(num_success)])
1.0e+09 *
0.058475265000000   1.502775087443930   0.057342191058931

Solved – How to a probability distribution diverge

Somehow, if you would take the area of a diverging Gamma distribution, you could express it as the area of a dirac delta distribution, plus something more since it has non zero weight at $x \neq 0$, so it would be bigger than one.

That's where your reasoning goes wrong: you can't automatically express any function which is infinite at $x = 0$ as a delta distribution plus something more. After all, if you could do this with $\delta(x)$, who's to say you couldn't also do it with $2\delta(x)$? Or $10^{-10}\delta(x)$? Or any other coefficient? It's just as valid to say that those distributions are zero for $x\neq 0$ and infinite at $x = 0$; why not use the same reasoning with them?

Actually, distributions (in the mathematical sense of distribution theory) should be thought of more like functions of functions - you put in a function and get out a number. For the delta distribution specifically, if you put in the function $f$, you get out the number $f(0)$. Distributions are not normal number-to-number functions. They're more complicated, and more capable, than such "ordinary" functions.

This idea of turning a function into a number is quite familiar to anyone who's used to dealing with probability. For example, the series of distribution moments - mean, standard deviation, skewness, kurtosis, and so on - can all be thought of as rules that turn a function (the probability distribution) into a number (the corresponding moment). Take the mean/expectation value, for instance. This rule turns a probability distribution $P(x)$ into the number $E_P[x]$, calculated as $$E_P[x] = \int P(x)\,x\ \mathrm{d}x$$ Or the rule for variance turns $P(x)$ into the number $\sigma_P^2$, where $$\sigma_P^2[x] = \int P(x)\,(x - E_P[x])^2\ \mathrm{d}x$$ My notation is a little weird here, but hopefully you get the idea.¹

You may notice something these rules have in common: in all of them, the way you get from the function to the number is by integrating the function times some other weighting function. This is a very common way to represent mathematical distributions. So it's natural to wonder, is there some weighting function $\delta(x)$ that allows you to represent the action of a delta distribution like this? $$f\to \int \delta(x)\, f(x)\ \mathrm{d}x$$ You can easily establish that if there is such a function, it has to be equal to $0$ at every $x\neq 0$. But you can't get a value for $\delta(0)$ in this way. You can show that it's larger than any finite number, but there is no actual value for $\delta(0)$ that makes this equation work out, using the standard ideas of integration.²

The reason for that is that there's more to the delta distribution than just this: $$\begin{cases}0, & x\neq 0 \\ \infty, & x = 0\end{cases}$$ That "$\infty$" is misleading. It stands in for a whole extra set of information about the delta distribution that normal functions just can't represent. And that's why you can't meaningfully say that the gamma distribution is "more" than the delta distribution. Sure, at any $x > 0$, the value of the gamma distribution is more than the value of the delta distribution, but all the useful information about the delta distribution is locked up in that point at $x = 0$, and that information is too rich and complex to allow you to say that one distribution is more than the other.

Technical details

¹Actually, you can flip things around and think of the probability distribution itself as the mathematical distribution. In this sense, the probability distribution is a rule that takes a weighting function, like $x$ or $(x - E[x])^2$, to a number, $E[x]$ or $\sigma_x^2$ respectively. If you think about it that way, the standard notation makes a bit more sense, but I think the overall idea is a bit less natural for a post about mathematical distributions.

²Specifically, by "standard ideas of integration" I'm taking about Riemann integration and Lebesgue integration, both of which have the property that two functions which differ only at a single point must have the same integral (given the same limits). If there were a function $\delta(x)$, it would differ from the function $0$ at only one point, namely $x = 0$, and thus the two functions' integrals would always have to be the same. $$\int_a^b \delta(x)f(x)\ \mathrm{d}x = \int_a^b (0)f(x)\ \mathrm{d}x = 0$$ So there is no number you can assign to $\delta(0)$ that makes it reproduce the effect of the delta distribution.

Best Answer

Related Solutions

Solved – Conditional Expected Value of Product of Normal and Log-Normal Distribution

Solved – How to a probability distribution diverge

Technical details

Related Question