MATLAB: Generating a population randomly starting from some parameters

carlogaussianMATLABmeanmontepopulationrandomvariance

Hello everybody, I am quite a beginner and I am sorry if my question seems trivial, but I hope somebody will help me.
Let's assume I have a quantity Q which is function of n inputs Xi:
Q = f(X1, X2, … Xn)
Now, let's assume that some of these inputs are distributed according to a Gaussian. Thus, for example, X1, X2 and X3 are randomly distributed with a well define mean value and standard deviation, while X4 … Xn are instead assumed to be constant.
I know how to generate randomly the populations X1, X2 and X3 on Matlab, with a command that should implement implicitly a Monte Carlo approach:
pop_X1 = X1_nom + randn(N,1) * X1_dev;
pop_X2 = X2_nom + randn(N,1) * X2_dev;
pop_X3 = X3_nom + randn(N,1) * X3_dev;
However, how do I generate Q taking into account all these input populations variations? Can I simply apply the function f aligning the vectors of X1, X2 and X3 previously generated?
Thanks!!
Paolo

Best Answer

This is sometimes known as statistical tolerancing, to identify the distribution of a function of some random variable, or a set of random variables.
Monte Carlo simulation is indeed one way you can solve the problem, at least approximately so. Thus, just evaluate the function Q at all of those sets of values, then plot a histogram, etc.
There are other methods you can use to compute or approximate the distribution of Q. One can use Taylor series approximations, or even Taguchi methods. (I wrote several papers on the topic of modified Taguchi methods for this exact purpose. Find one of them here in Technometrics .)
A problem is that it is often true that if your function Q is nonlinear, that no common distribution exists to describe the distribution of Q. In those cases, you might use a member of the Pearson or Johnson families of distributions to yield an approximate distribution.
Edit: I pointed out some ideas in my comment below. So lets look at what happens for a simple nonlinear function. How about Q(x) = x^2?
Q = @(x) x.^2;
Really, how simple can you get, and still be nonlinear?
Now, suppose that x is normally distributed, with mean 0 and variance 1. I think that anyone will see, even without any computation that Q(x) cannot look normally distributed, since Q(x) can NEVER take on negative values. It turns out that in this case, Q(X) will have a simple distribution, a Chi-squared random variable, with one degree of freedom.
ezplot(@(x) chi2pdf(x,1),[0,10])
grid on
Sometimes you get lucky. As you can see though, this distribution looks quite non-normmally distributed. The reason is Q(x) has the property that around zero, we cannot simply truncate the Taylor series of x^2 to only the linear term and below. That quadratic terms is hugely significant.
We can try to convince ourselves that I got the distribution right by a simple Monte Carlo simulation.
hist(Q(randn(10000,1)),100)
Yep. Looks close enough for government work to me.
But suppose we changed x slightly? Suppose that x was normally distributed, with mean 100, and variance 1? In fact, the true distribution here should be a non-central chi-square, but I'm feeling too lazy to generate that one. A Monte Carlo simulation will suffice.
hist(Q(100 + randn(100000,1)),100)
Locally, it turns out that the Taylor series of Q(x), expanded around the mean of x (i.e., x==100), is now pretty well approximated by a linear polynomial, so dropping the quadratic term does not hurt us too much. That is, we can write
Q(u + 100) = 10000 + 200*u + u^2
Thus, for moderately small values of u, we can truncate the polynomial to ignore the u^2 term. Q is indeed sufficiently locally linear out there when u is a Normal random variable with mean zero and variance 1. (Then u+100 will clearly have mean 100 and variance 1.)
There is much more we can do with the techniques of statistical tolerancing. I've just brushed the surface here. The multidimensional case, where Q is a function of several random variables is really no more complicated.