MATLAB: Sum of two random variables with different distributions

cdfrandom variables

Hi everyone,
I have a kernel cdf for variable x1 (bandwith=0.22) and an exponential cdf for variable x2 (lambda =1/14). Is there a command in Matlab that allows me to get the cdf for a new variable that is the sum of x1 and x2 (y=x1+x2)? The range of values for x2 is [0;100] and for x1 is ]0; +∞[.
Thanks in advance!
Best,
Elizabeth

Best Answer

I'm sorry. There is no command in MATLaB that will give you the CDF of the sum of two general random variables. This is for good reason: there is NO simple way to write the CDF of the sum of two general, unrelated random variables, with arbitrary distributions.
There are many things we might wish to do that have no simple solutions. That is not to say this is impossible. In fact, there are several classical solution approaches. The general idea is called statistical tolerancing. I'll suggest a couple of ideas that you can use.
1. You can do a Monte Carlo simulation. Generate random samples from each component, then form the sum. You can then compute a sample CDF from the data points. Lots and lots of points here will yield a decent approximation to the CDF.
2. Compute the mean, variance, skewness, kurtosis, etc., of the sum. There are many ways this can be done, using Taylor series approximations, or using various Taguchi style methods. For example, as Walter points out, the mean of a sum is simple to compute. However, the higher order moments will take a little more work. Given those moments, then you can choose a member of a distribution family that matches those moments as well as possible. The usual families of distributions that are used are the Pearson and the Johnson families.
The above scheme (#2) has its flaws for two distributions that are so different from each other, especially when one of them is bounded. That suggests that no simple Pearson or Johnson member will be a good approximant. So I would suggest #1 as the best approach, to form a completely empirical CDF.
If you do choose option #2, then I would still strongly suggest using a Monte Carlo scheme to validate those results. This is especially true because a Monte Carlo simulation is so easy to do.