Solved – Different ways of generating Geometric distributions

geometric-distributionMATLABmonte carlor

I'm trying a simple Monte Carlo example which led to some confusion about ways of generating random Geometrically distributed values in R.

"A supercomputer is shared by 250 independent subscribers. Each day, each subscriber uses the facility with probability 0.3. The number of tasks sent by each active user has Geometric distribution with parameter 0.15, and each task takes a Gamma(10, 3) distributed computer time (in minutes). Tasks are processed consecutively. What is the probability that all the tasks will be processed, that is, the total requested computer time is less than 24 hours? Estimate this probability, attaining the margin of error ±0.01 with probability 0.99."

The solution is given in pure MATLAB and I will spare you everything unrelated to my question. The Geometric distribution is calculated as

Y=ceil( log(1-rand(X,1))/log(1-q) );

where X is a Binomial variable (the number of active subscribers a given day) and q is the parameter to the Geometric distribution. I translated this into R as

Y <- ceiling(log(1-runif(X))/log(1-q))

and got the expected result (~0.17) when I ran the simulation. However, as I understand it, R, unlike pure MATLAB, can generate Geometrically distributed random variables directly, like so

Y <- rgeom(X, q)

and when I tried that instead the simulation suddenly started returning values (~0.55).

I'm probably missing something obvious but can anyone explain what's going on here? I tried plotting the different distributions given by the expected value of X and they look similar enough but the results are obviously not equal. If any necessary details are missing tell me, and I will add them.

Best Answer

[The underlying issue here is not to do with R, but with the geometric distribution itself (so you could quite reasonably drop "in R" from your title; R simply revealed the problem to you).]

Note that there are two ways of writing the geometric distribution; one as the number of failures to the first success, and the other as the number of trials to the first success.

(Wikipedia gives both forms).

It's important to make sure you're dealing with the right form of the geometric.

You can see that the first code you gave produces a minimum value of 1, so it's of the "number of trials" form. R's geometric random number function produces a minimum value of 0, which is of the "number of failures" form.

To convert from the "number of failures" form to the "number of trials" form, you simply add 1.

An essentially identical issue occurs more generally with the negative binomial distribution -- again you have to be careful to use the right one.

Related Question